Sequence-based prediction of protein interaction sites with an integrative method

Motivation: Identification of protein interaction sites has significant impact on understanding protein function, elucidating signal transduction networks and drug design studies. With the exponentially growing protein sequence data, predictive methods using sequence information only for protein int...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2009-03, Vol.25 (5), p.585-591
Hauptverfasser:	Chen, Xue-wen, Jeong, Jong Cheol
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Binding Sites Biological and medical sciences Computational Biology - methods Databases, Protein Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Molecular Protein Conformation Proteins - chemistry Sequence Analysis, Protein - methods
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: Identification of protein interaction sites has significant impact on understanding protein function, elucidating signal transduction networks and drug design studies. With the exponentially growing protein sequence data, predictive methods using sequence information only for protein interaction site prediction have drawn increasing interest. In this article, we propose a predictive model for identifying protein interaction sites. Without using any structure data, the proposed method extracts a wide range of features from protein sequences. A random forest-based integrative model is developed to effectively utilize these features and to deal with the imbalanced data classification problem commonly encountered in binding site predictions. Results: We evaluate the predictive method using 2829 interface residues and 24 616 non-interface residues extracted from 99 polypeptide chains in the Protein Data Bank. The experimental results show that the proposed method performs significantly better than two other sequence-based predictive methods and can reliably predict residues involved in protein interaction sites. Furthermore, we apply the method to predict interaction sites and to construct three protein complexes: the DnaK molecular chaperone system, 1YUW and 1DKG, which provide new insight into the sequence–function relationship. We show that the predicted interaction sites can be valuable as a first approach for guiding experimental methods investigating protein–protein interactions and localizing the specific interface residues. Availability: Datasets and software are available at http://ittc.ku.edu/~xwchen/bindingsite/prediction. Contact: xwchen@ku.edu Supplementary information: Supplementary data are available at Bioinformatics online.
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btp039