MCP: A multi-component learning machine to predict protein secondary structure

The Gene or DNA sequence in every cell does not control genetic properties on its own; Rather, this is done through the translation of DNA into protein and subsequent formation of a certain 3D structure. The biological function of a protein is tightly connected to its specific 3D structure. Predicti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers in biology and medicine 2019-07, Vol.110, p.144-155
Hauptverfasser:	Khalatbari, Leila, Kangavari, M.R., Hosseini, Saeid, Yin, Hongzhi, Cheung, Ngai-Man
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Acids Amino acid sequence Artificial intelligence Biology Deoxyribonucleic acid DNA DNA structure Ensemble prediction machine Experimental methods Fuzzy k-nearest neighbor Learning algorithms Machine learning Methods Nucleotide sequence Protein secondary structure prediction Protein structure Proteins Secondary structure Singers Structure-function relationships Support vector machine Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Gene or DNA sequence in every cell does not control genetic properties on its own; Rather, this is done through the translation of DNA into protein and subsequent formation of a certain 3D structure. The biological function of a protein is tightly connected to its specific 3D structure. Prediction of the protein secondary structure is a crucial intermediate step towards elucidating its 3D structure and function. Traditional experimental methods for prediction of protein structure are expensive and time-consuming. Nevertheless, the average accuracy of the suggested solutions has hardly reached beyond 80%. The possible underlying reasons are the ambiguous sequence-structure relation, noise in input protein data, class imbalance, and the high dimensionality of the encoding schemes. Furthermore, we utilize a compound string dissimilarity measure to directly interpret protein sequence content and avoid information loss. In order to improve accuracy, we employ two different classifiers including support vector machine and fuzzy nearest neighbor and collectively aggregate the classification outcomes to infer the final protein structures. We conduct comprehensive experiments to compare our model with the current state-of-the-art approaches. The experimental results demonstrate that given a set of input sequences, our multi-component framework can accurately predict the protein structure. Nevertheless, the effectiveness of our unified model can be further enhanced through framework configuration. •Extracting features from protein sequences may cause information loss. .•Interpreting the protein latent language can help with preserving more information.•A compound dissimilarity measure with various coefficients can effectively interpret the protein language.•A muti-component architecture will better address the complexity of structure prediction.•A flexible and effectual fuzzy aggregation pool enhances the accuracy of a multicomponent framework.
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2019.04.040