High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature biotechnology 2002-08, Vol.20 (8), p.831-835
Hauptverfasser: Roulet, Emmanuelle, Busso, Stéphane, Camargo, Anamaria A., Simpson, Andrew J.G., Mermod, Nicolas, Bucher, Philipp
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile 1 was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro –selected ligands using standard hidden Markov model training algorithms 2 , 3 . Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) 4 and serial analysis of gene expression (SAGE) 5 protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores 6 . This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
ISSN:1087-0156
1546-1696
DOI:10.1038/nbt718