Sann: Solvent accessibility prediction of proteins by nearest neighbor method
We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real‐value prediction as well as two‐state and three‐state discrete predictions can be obtained. The method utilizes the z‐s...
Gespeichert in:
Veröffentlicht in: | Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2012-07, Vol.80 (7), p.1791-1797 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real‐value prediction as well as two‐state and three‐state discrete predictions can be obtained. The method utilizes the z‐score value of the distance measure in the feature vector space to estimate the relative contribution among the k‐nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two‐state prediction with the threshold of 25%), 65.1% (three‐state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three‐state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/∼newton/sann/.Proteins 2012; © 2012 Wiley Periodicals, Inc. |
---|---|
ISSN: | 0887-3585 1097-0134 |
DOI: | 10.1002/prot.24074 |