PROSO tries to answer the following question:

"Which of my cloned proteins have the best/worst chances to be soluble upon heterologous expression?"

The prediction is based on a classifier exploiting subtle differences between soluble proteins from TargetDB and the PDB and notoriously insoluble proteins from TargetDB. For more details please read our brief method description.

You can also download supplementary materials.

The method is based on around 80000 proteins. Evaluated by 10-fold cross-validation it achieved accuracy =71%, area under ROC curve =0.785 and MCC (matthew's correlation coefficient) =0.422.

The input protein sequences are categorized into two classes: soluble and insoluble. Additionally the solubility score (0-1) is provided.

The score threshold value is set by default to 0.6.
By setting it higher one can increase classification precision (selectivity).
One the other hand decreasing threshold value will result in higher recall (sensitivity).

The classifier is meant to be evaluated against proteins without trans-membrane segment (please use TMHMM to test your sequences).
If the result cannot be calculated, a comment is written.

Submit your sequence:

Input sequences
Please provide upto 20 unambiguous protein sequences (only 20 amino acids symbols are allowed) of length between 21-2000 amino acids in multiple FASTA format

Sample of input sequence