Fine‐grained protein fold assignment by support vector machines using generalized n peptide coding schemes and jury voting from multiple‐parameter sets

In the coarse‐grained fold assignment of major protein classes, such as all‐α, all‐β, α + β, α/β proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine‐grained assignment of folds, such as those defined in the Structural Classification of Prote...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2003-03, Vol.50 (4), p.531-536
Hauptverfasser: Yu, Chin‐Sheng, Wang, Jung‐Ying, Yang, Jinn‐Moon, Lyu, Ping‐Chiang, Lin, Chih‐Jen, Hwang, Jenn‐Kang
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the coarse‐grained fold assignment of major protein classes, such as all‐α, all‐β, α + β, α/β proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine‐grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n ‐peptide and jury voting, to the fine‐grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set—significantly higher than the previous results. On 10‐fold cross‐validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence‐coding schemes can significantly improve the fine‐grained fold prediction. Our approach should be useful in structure prediction and modeling. Proteins 2003;50:531–536. © 2003 Wiley‐Liss, Inc.
ISSN:0887-3585
1097-0134
DOI:10.1002/prot.10313