PANDA: Protein function prediction using domain architecture and affinity propagation

We developed PANDA ( P ropagation of A ffinity a n d D omain A rchitecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST agai...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientific reports 2018-02, Vol.8 (1), p.3484-10, Article 3484
Hauptverfasser:	Wang, Zheng, Zhao, Chenguang, Wang, Yiheng, Sun, Zheng, Wang, Nan
Format:	Artikel
Sprache:	eng
Schlagworte:	631/114/2410 631/114/794 Affinity Algorithms Bayes Theorem Bayesian analysis Computational Biology Databases, Protein Deoxyribonucleic acid DNA Gene Ontology Humanities and Social Sciences multidisciplinary Propagation Protein Conformation Protein Domains - genetics Proteins - chemistry Proteins - genetics Science Science (multidisciplinary) Software Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We developed PANDA ( P ropagation of A ffinity a n d D omain A rchitecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-018-21849-1