A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination

[Display omitted] •Prediction performance of protein structural class has been improved.•A high-quality feature extraction technique has been designed.•A recursive feature selection has been used to reduce feature abundance. Structural class characterizes the overall folding type of a protein or its...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational biology and chemistry 2015-12, Vol.59, p.95-100
Hauptverfasser:	Li, Xiaowei, Liu, Taigang, Tao, Peiying, Wang, Chunhua, Chen, Lanming
Format:	Artikel
Sprache:	eng
Schlagworte:	Auto cross covariance Computational Biology - methods Databases, Protein Low-similarity Position-specific score matrix Protein Conformation Protein Folding Proteins - chemistry Proteins - classification Recursive feature elimination Sequence Analysis, Protein Support vector machine
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •Prediction performance of protein structural class has been improved.•A high-quality feature extraction technique has been designed.•A recursive feature selection has been used to reduce feature abundance. Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.
ISSN:	1476-9271 1476-928X
DOI:	10.1016/j.compbiolchem.2015.08.012