Efficient Algorithms for Optimizing Whole Genome Alignment with Noise

Given the genomes (DNA) of two related species, the whole genome alignment problem is to locate regions on the genomes that possibly contain genes conserved over the two species. Motivated by existing heuristic-based software tools, we initiate the study of optimization problems that attempt to unco...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lam, T. W., Lu, N., Ting, H. F., Wong, Prudence W. H., Yiu, S. M.
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Dynamic Programming Algorithm Exact sciences and technology Opposite Orientation Pattern recognition. Digital image processing. Computational geometry Sense Strand Size Requirement Software Space Requirement Speech and sound recognition and synthesis. Linguistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Given the genomes (DNA) of two related species, the whole genome alignment problem is to locate regions on the genomes that possibly contain genes conserved over the two species. Motivated by existing heuristic-based software tools, we initiate the study of optimization problems that attempt to uncover conserved genes with a global concern. Another interesting feature in our formulation is the tolerance of noise. Yet this makes the optimization problems more complicated; a brute-force approach takes time exponential in the noise level. In this paper we show how an insight into the problem structure can lead to a drastic improvement in the time and space requirement (precisely, to O(k2n2) and O(k2n), respectively, where n is the size of the input and k is the noise level). The reduced space requirement allows us to implement the new algorithms on a PC. It is exciting to see that when compared with the most popular whole genome alignment software (MUMMER) on real data sets, the new algorithms consistently uncover more conserved genes (that have been published by GenBank), while preserving the preciseness of the output.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-24587-2_38