Enhanced homology searching through genome reading frame predetermination

Motivation: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2004-06, Vol.20 (9), p.1416-1427
Hauptverfasser: Yuan, Jeffrey, Bush, Bruce, Elbrecht, Alex, Liu, Yuan, Zhang, Theresa, Zhao, Wenqing, Blevins, Richard
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motivation: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query-based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query-based homology algorithms against genomic data without losing search selectivity. Results: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology-based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain and acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein–protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than traditional protein–DNA searches against a raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic analysis was employed to demonstrate that the algorithms remained selective, while comparisons of the algorithms showed that the protein–protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein–protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein–DNA searches against raw genomic data.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bth115