Sann: Solvent accessibility prediction of proteins by nearest neighbor method

We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real‐value prediction as well as two‐state and three‐state discrete predictions can be obtained. The method utilizes the z‐s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2012-07, Vol.80 (7), p.1791-1797
Hauptverfasser: Joo, Keehyoung, Lee, Sung Jong, Lee, Jooyoung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real‐value prediction as well as two‐state and three‐state discrete predictions can be obtained. The method utilizes the z‐score value of the distance measure in the feature vector space to estimate the relative contribution among the k‐nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two‐state prediction with the threshold of 25%), 65.1% (three‐state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three‐state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/∼newton/sann/.Proteins 2012; © 2012 Wiley Periodicals, Inc.
ISSN:0887-3585
1097-0134
DOI:10.1002/prot.24074