Sampling strategy for protein complex prediction using cluster size frequency

In this paper we propose a Markov chain Monte Carlo sampling method for predicting protein complexes from protein–protein interactions (PPIs). Many of the existing tools for this problem are designed more or less based on a density measure of a subgraph of the PPI network. This kind of measures is l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Gene 2013-04, Vol.518 (1), p.152-158
Hauptverfasser:	Tatsuke, Daisuke, Maruyama, Osamu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms genes Markov chain Markov chain Monte Carlo Metropolis–Hastings Multiprotein Complexes - genetics Multiprotein Complexes - metabolism Power-law prediction Protein complex Protein Interaction Mapping - methods Protein Multimerization proteins Protein–protein interaction Sampling yeasts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper we propose a Markov chain Monte Carlo sampling method for predicting protein complexes from protein–protein interactions (PPIs). Many of the existing tools for this problem are designed more or less based on a density measure of a subgraph of the PPI network. This kind of measures is less effective for smaller complexes. On the other hand, it can be found that the number of complexes of a size in a database of protein complexes follows a power-law. Thus, most of the complexes are small-sized. For example, in CYC2008, a database of curated protein complexes of yeast, 42% of the complexes are heterodimeric, i.e., a complex consisting of two different proteins. In this work, we propose a protein complex prediction algorithm, called PPSampler (Proteins' Partition Sampler), which is designed based on the Metropolis–Hastings algorithm using a parameter representing a target value of the relative frequency of the number of predicted protein complexes of a particular size. In a performance comparison, PPSampler outperforms other existing algorithms. Furthermore, about half of the predicted clusters that are not matched with any known complexes in CYC2008 are statistically significant by Gene Ontology terms. Some of them can be expected to be true complexes. ► This is the first MCMC method for the protein complex prediction problem. ► The proposed method, called PPSampler, is shown to outperform seven popular tools. ► Predicted complexes unmatching any known complexes are statistically significant.
ISSN:	0378-1119 1879-0038
DOI:	10.1016/j.gene.2012.11.050