Inferring pathways from gene lists using a literature-derived network of biological relationships

Motivation: A number of omic technologies such as transcriptional profiling, proteomics, literature searches, genetic association, etc. help in the identification of sets of important genes. A subset of these genes may act in a coordinated manner, possibly because they are part of the same biologica...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2005-03, Vol.21 (6), p.788-793
Hauptverfasser: Rajagopalan, Dilip, Agarwal, Pankaj
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motivation: A number of omic technologies such as transcriptional profiling, proteomics, literature searches, genetic association, etc. help in the identification of sets of important genes. A subset of these genes may act in a coordinated manner, possibly because they are part of the same biological pathway. Interpreting such gene lists and relating them to pathways is a challenging task. Databases of biological relationships between thousands of mammalian genes can help in deciphering omics data. The relationships between genes can be assembled into a biological network with each protein as a node and each relationship as an edge between two proteins (or nodes). This network may then be searched for subnetworks consisting largely of interesting genes from the omics experiment. The subset of genes in the subnetwork along with the web of relationships between them helps to decipher the underlying pathways. Finding such subnetworks that maximally include all proteins from the query set but few others is the focus for this paper. Results: We present a heuristic algorithm and a scoring function that work well both on simulated data and on data from known pathways. The scoring function is an extension of a previous study for a single biological experiment. We use a simple set of heuristics that provide a more efficient solution than the simulated annealing method. We find that our method works on reasonably complex curated networks containing ∼9000 biological entities (genes and metabolites), and ∼30 000 biological relationships. We also show that our method can pick up a pathway signal from a query list including a moderate number of genes unrelated to the pathway. In addition, we quantify the sensitivity and specificity of the technique. Contact: dilip_rajagopalan@gsk.com
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti069