Data from: An improved hypergeometric probability method for identification of functionally linked proteins using phylogenetic profiles
Predicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a genome using biochemical studies, bioin...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predicting functions of proteins and alternatively spliced isoforms
encoded in a genome is one of the important applications of bioinformatics
in the post-genome era. Due to the practical limitation of experimental
characterization of all proteins encoded in a genome using biochemical
studies, bioinformatics methods provide powerful tools for function
annotation and prediction. These methods also help minimize the growing
sequence-to-function gap. Phylogenetic profiling is a bioinformatics
approach to identify the influence of a trait across species and can be
employed to infer the evolutionary history of proteins encoded in genomes.
Here we propose an improved phylogenetic profile-based method which
considers the co-evolution of the reference genome to derive the basic
similarity measure, the background phylogeny of target genomes for profile
generation and assigning weights to target genomes. The ordering of
genomes and the runs of consecutive matches between the proteins were used
to define phylogenetic relationships in the approach. We used Escherichia
coli K12 genome as the reference genome and its 4195 proteins were used in
the current analysis. We compared our approach with two existing methods
and our initial results show that the predictions have outperformed two of
the existing approaches. In addition, we have validated our method using a
targeted protein-protein interaction network derived from protein-protein
interaction database STRING. Our preliminary results indicates that
improvement in function prediction can be attained by using
coevolution-based similarity measures and the runs on to the same scale
instead of computing them in different scales. Our method can be applied
at the whole-genome level for annotating hypothetical proteins from
prokaryotic genomes. |
---|---|
DOI: | 10.5061/dryad.m6t4j |