Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis

[Display omitted] •New protein fingerprints for capturing the topological properties of protein complexes in a linear format.•A SVM based predictive model for discriminating diabetes versus non-diabetes complexes with an AUC of 0.78.•Model tested on an external data set derived from text mining larg...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational biology and chemistry 2016-12, Vol.65, p.37-44
Hauptverfasser: Vyas, Renu, Bapat, Sanket, Jain, Esha, Karthikeyan, Muthukumarasamy, Tambe, Sanjeev, Kulkarni, Bhaskar D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •New protein fingerprints for capturing the topological properties of protein complexes in a linear format.•A SVM based predictive model for discriminating diabetes versus non-diabetes complexes with an AUC of 0.78.•Model tested on an external data set derived from text mining large number of PubMed abstracts.•Network modeling to identify new disease targets. In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein–protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.
ISSN:1476-9271
1476-928X
DOI:10.1016/j.compbiolchem.2016.09.011