Performance-Guarantee Gene Predictions via Spliced Alignment

An important and still unsolved problem in gene prediction is designing an algorithm that not only predicts genes but estimates the quality ofindividualpredictions as well. Since experimental biologists areinterested mainly in the reliability of individual predictions (rather than in the average rel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genomics (San Diego, Calif.) Calif.), 1998-08, Vol.51 (3), p.332-339
Hauptverfasser: Mironov, Andrey A., Roytberg, Michael A., Pevzner, Pavel A., Gelfand, Mikhail S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An important and still unsolved problem in gene prediction is designing an algorithm that not only predicts genes but estimates the quality ofindividualpredictions as well. Since experimental biologists areinterested mainly in the reliability of individual predictions (rather than in the average reliability of an algorithm) we attempted to develop a gene recognition algorithm that guarantees a certain quality of predictions. We demonstrate here that the similarity level with a related protein is a reliable quality estimator for thespliced alignmentapproach to gene recognition. We also study the average performance of the spliced alignment algorithm for different targets on a complete set of human genomic sequences with known relatives and demonstrate that the average performance of the method remains high even for very distant targets. Using plant, fungal, and prokaryotic target proteins for recognition of human genes leads to accurate predictions with 95, 93, and 91% correlation coefficient, respectively. For target proteins with similarity score above 60%, not only the average correlation coefficient is very high (97% and up) but also the quality of individual predictions isguaranteedto be at least 82%. It indicates that for this level of similarity the worst case performance of the spliced alignment algorithm is better than the average case performance of many statistical gene recognition methods.
ISSN:0888-7543
1089-8646
DOI:10.1006/geno.1998.5251