Parallelization of the functional flow algorithm for prediction of protein function using protein-protein interaction networks

Protein-protein interaction networks provide important information about functions of proteins. There are various studies which analyze interaction networks and predict functions of novel proteins based on their network connectivity. However, all of these methods are sequential methods that do not u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Akkoyun, Emrah, Can, Tolga
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Protein-protein interaction networks provide important information about functions of proteins. There are various studies which analyze interaction networks and predict functions of novel proteins based on their network connectivity. However, all of these methods are sequential methods that do not utilize high performance computing. Functional flow is one of these methods that uses network connectivity, distance effect, and topology of the network with local and global views to predict protein function. With these advantages, the functional flow algorithm produces more accurate results compared to other techniques. However, due to lack of a parallelized version of the algorithm, the method cannot be practically applied on large scale networks of complex species. In this paper, we provide a parallel implementation of functional flow. We use Hadoop which is one of the open source map/reduce environments. For our experiments, we installed Hadoop on 18 hosts with eight cores each. The first map/reduce job distributes the protein interaction network as a format which allows parallel distributed computing on all the worker nodes. The other map/reduce jobs generate flows for each known protein function and the function of novel proteins are predicted by accumulating all of these generated flows. Our experiments show that the method can be distributed on worker nodes efficiently and the application can provide better performance as the number of resources increases.
DOI:10.1109/HPCSim.2011.5999807