Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one fo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational biology 2000-01, Vol.7 (3-4), p.345-362
Hauptverfasser:	Marsan, L, Sagot, M F
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Binding Sites - genetics Computational Biology Consensus Sequence DNA, Bacterial - genetics DNA, Bacterial - metabolism Genes, Regulator Genome, Bacterial Life Sciences Models, Genetic Other Promoter Regions, Genetic Sequence Analysis, DNA - statistics & numerical data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes--that is, the motifs themselves--are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown.
ISSN:	1066-5277 1557-8666
DOI:	10.1089/106652700750050826