A regularized discriminative model for the prediction of protein–peptide interactions

Motivation: Short well-defined domains known as peptide recognition modules (PRMs) regulate many important protein–protein interactions involved in the formation of macromolecular complexes and biochemical pathways. Since high-throughput experiments like yeast two-hybrid and phage display are expens...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2006-03, Vol.22 (5), p.532-540
Hauptverfasser: Lehrach, Wolfgang P., Husmeier, Dirk, Williams, Christopher K. I.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motivation: Short well-defined domains known as peptide recognition modules (PRMs) regulate many important protein–protein interactions involved in the formation of macromolecular complexes and biochemical pathways. Since high-throughput experiments like yeast two-hybrid and phage display are expensive and intrinsically noisy, it would be desirable to more specifically target or partially bypass them with complementary in silico approaches. In the present paper, we present a probabilistic discriminative approach to predicting PRM-mediated protein–protein interactions from sequence data. The model is motivated by the discriminative model of Segal and Sharan as an alternative to the generative approach of Reiss and Schwikowski. In our evaluation, we focus on predicting the interaction network. As proposed by Williams, we overcome the problem of susceptibility to over-fitting by adopting a Bayesian a posteriori approach based on a Laplacian prior in parameter space. Results: The proposed method was tested on two datasets of protein–protein interactions involving 28 SH3 domain proteins in Saccharmomyces cerevisiae, where the datasets were obtained with different experimental techniques. The predictions were evaluated with out-of-sample receiver operator characteristic (ROC) curves. In both cases, Laplacian regularization turned out to be crucial for achieving a reasonable generalization performance. The Laplacian-regularized discriminative model outperformed the generative model of Reiss and Schwikowski in terms of the area under the ROC curve on both datasets. The performance was further improved with a hybrid approach, in which our model was initialized with the motifs obtained with the method of Reiss and Schwikowski. Availability: Software and supplementary material is available from Contact:wlehrach@ed.ac.uk
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti804