Clustering of DNA sequences in human promoters

We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome research 2004-08, Vol.14 (8), p.1562-1574
Hauptverfasser: FitzGerald, Peter C, Shlyakhtenko, Andrey, Mir, Alain A, Vinson, Charles
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1574
container_issue 8
container_start_page 1562
container_title Genome research
container_volume 14
creator FitzGerald, Peter C
Shlyakhtenko, Andrey
Mir, Alain A
Vinson, Charles
description We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.
doi_str_mv 10.1101/gr.1953904
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_509265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>66763895</sourcerecordid><originalsourceid>FETCH-LOGICAL-c439t-14a3d518790b5814685eed648b89b46c72c048d7d164f67b2834cb8e2efc8053</originalsourceid><addsrcrecordid>eNpVkLtOwzAUhj2AaCksPADKxICUYseX2AND1XKTKli6W45zkgYldrETJN6eVo24TGc4338uH0JXBM8JweSuDnOiOFWYnaApwVKmCnMyQecxvmOMKZPyDE0Iz7jghE_RfNkOsYfQuDrxVbJ6XSQRPgZwFmLSuGQ7dMYlu-A7v6fiBTqtTBvhcqwztHl82Cyf0_Xb08tysU4to6pPCTO05ETmChdcEiYkBygFk4VUBRM2zyxmssxLIlgl8iKTlNlCQgaVlZjTGbo_jt0NRQelBdcH0-pdaDoTvrQ3jf7fcc1W1_5Tc6wyccjfjPng98_EXndNtNC2xoEfohYiF1SqA3h7BG3wMQaofnYQrA9CdR30KHQPX_-96hcdbdJvu9Vylw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>66763895</pqid></control><display><type>article</type><title>Clustering of DNA sequences in human promoters</title><source>MEDLINE</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>FitzGerald, Peter C ; Shlyakhtenko, Andrey ; Mir, Alain A ; Vinson, Charles</creator><creatorcontrib>FitzGerald, Peter C ; Shlyakhtenko, Andrey ; Mir, Alain A ; Vinson, Charles</creatorcontrib><description>We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.</description><identifier>ISSN: 1088-9051</identifier><identifier>DOI: 10.1101/gr.1953904</identifier><identifier>PMID: 15256515</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Base Sequence ; Cluster Analysis ; Computational Biology - methods ; Consensus Sequence ; Humans ; Letters ; Models, Genetic ; Molecular Sequence Data ; Promoter Regions, Genetic ; Transcription Initiation Site</subject><ispartof>Genome research, 2004-08, Vol.14 (8), p.1562-1574</ispartof><rights>Copyright 2004 Cold Spring Harbor Laboratory Press ISSN</rights><rights>Copyright © 2004, Cold Spring Harbor Laboratory Press 2004</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c439t-14a3d518790b5814685eed648b89b46c72c048d7d164f67b2834cb8e2efc8053</citedby><cites>FETCH-LOGICAL-c439t-14a3d518790b5814685eed648b89b46c72c048d7d164f67b2834cb8e2efc8053</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC509265/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC509265/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15256515$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>FitzGerald, Peter C</creatorcontrib><creatorcontrib>Shlyakhtenko, Andrey</creatorcontrib><creatorcontrib>Mir, Alain A</creatorcontrib><creatorcontrib>Vinson, Charles</creatorcontrib><title>Clustering of DNA sequences in human promoters</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.</description><subject>Base Sequence</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Consensus Sequence</subject><subject>Humans</subject><subject>Letters</subject><subject>Models, Genetic</subject><subject>Molecular Sequence Data</subject><subject>Promoter Regions, Genetic</subject><subject>Transcription Initiation Site</subject><issn>1088-9051</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkLtOwzAUhj2AaCksPADKxICUYseX2AND1XKTKli6W45zkgYldrETJN6eVo24TGc4338uH0JXBM8JweSuDnOiOFWYnaApwVKmCnMyQecxvmOMKZPyDE0Iz7jghE_RfNkOsYfQuDrxVbJ6XSQRPgZwFmLSuGQ7dMYlu-A7v6fiBTqtTBvhcqwztHl82Cyf0_Xb08tysU4to6pPCTO05ETmChdcEiYkBygFk4VUBRM2zyxmssxLIlgl8iKTlNlCQgaVlZjTGbo_jt0NRQelBdcH0-pdaDoTvrQ3jf7fcc1W1_5Tc6wyccjfjPng98_EXndNtNC2xoEfohYiF1SqA3h7BG3wMQaofnYQrA9CdR30KHQPX_-96hcdbdJvu9Vylw</recordid><startdate>20040801</startdate><enddate>20040801</enddate><creator>FitzGerald, Peter C</creator><creator>Shlyakhtenko, Andrey</creator><creator>Mir, Alain A</creator><creator>Vinson, Charles</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20040801</creationdate><title>Clustering of DNA sequences in human promoters</title><author>FitzGerald, Peter C ; Shlyakhtenko, Andrey ; Mir, Alain A ; Vinson, Charles</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c439t-14a3d518790b5814685eed648b89b46c72c048d7d164f67b2834cb8e2efc8053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Base Sequence</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Consensus Sequence</topic><topic>Humans</topic><topic>Letters</topic><topic>Models, Genetic</topic><topic>Molecular Sequence Data</topic><topic>Promoter Regions, Genetic</topic><topic>Transcription Initiation Site</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>FitzGerald, Peter C</creatorcontrib><creatorcontrib>Shlyakhtenko, Andrey</creatorcontrib><creatorcontrib>Mir, Alain A</creatorcontrib><creatorcontrib>Vinson, Charles</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>FitzGerald, Peter C</au><au>Shlyakhtenko, Andrey</au><au>Mir, Alain A</au><au>Vinson, Charles</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of DNA sequences in human promoters</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2004-08-01</date><risdate>2004</risdate><volume>14</volume><issue>8</issue><spage>1562</spage><epage>1574</epage><pages>1562-1574</pages><issn>1088-9051</issn><abstract>We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>15256515</pmid><doi>10.1101/gr.1953904</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1088-9051
ispartof Genome research, 2004-08, Vol.14 (8), p.1562-1574
issn 1088-9051
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_509265
source MEDLINE; PubMed Central; Alma/SFX Local Collection
subjects Base Sequence
Cluster Analysis
Computational Biology - methods
Consensus Sequence
Humans
Letters
Models, Genetic
Molecular Sequence Data
Promoter Regions, Genetic
Transcription Initiation Site
title Clustering of DNA sequences in human promoters
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A00%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20DNA%20sequences%20in%20human%20promoters&rft.jtitle=Genome%20research&rft.au=FitzGerald,%20Peter%20C&rft.date=2004-08-01&rft.volume=14&rft.issue=8&rft.spage=1562&rft.epage=1574&rft.pages=1562-1574&rft.issn=1088-9051&rft_id=info:doi/10.1101/gr.1953904&rft_dat=%3Cproquest_pubme%3E66763895%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=66763895&rft_id=info:pmid/15256515&rfr_iscdi=true