Sequence features that drive human promoter function and tissue specificity

Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient tra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome research 2010-07, Vol.20 (7), p.890-898
Hauptverfasser: Landolin, Jane M, Johnson, David S, Trinklein, Nathan D, Aldred, Shelly F, Medina, Catherine, Shulha, Hennady, Weng, Zhiping, Myers, Richard M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 898
container_issue 7
container_start_page 890
container_title Genome research
container_volume 20
creator Landolin, Jane M
Johnson, David S
Trinklein, Nathan D
Aldred, Shelly F
Medina, Catherine
Shulha, Hennady
Weng, Zhiping
Myers, Richard M
description Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.
doi_str_mv 10.1101/gr.100370.109
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2892090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>733626193</sourcerecordid><originalsourceid>FETCH-LOGICAL-c386t-4855526f8f50d0b0ad061c81541b719e21ce22e7def2b445fda3ead47425e8dd3</originalsourceid><addsrcrecordid>eNpVkM1LxDAQxYMofh-9Sm6eqpO0aZOLIOIXCh7Uc8gmk93Itl2TVPC_N7IqepoZ5sebN4-QIwanjAE7m8dTBlB3ZQS1QXaZaFQlmlZtlh6krBQItkP2UnqFwjVSbpMdDgJYq8QuuX_CtwkHi9SjyVPERPPCZOpieEe6mHoz0FUc-zFjpH4abA7jQM3gaA4pTUjTCm3wwYb8cUC2vFkmPPyu--Tl-ur58rZ6eLy5u7x4qGwt21w1UgjBWy-9AAczMA5aZmUxzmYdU8iZRc6xc-j5rGmEd6ZG45qu4QKlc_U-OV_rrqZZj87ikKNZ6lUMvYkfejRB_98MYaHn47vmUnFQUAROvgXiWL5PWfchWVwuzYDjlHRX1y1vmaoLWa1JG8eUIvrfKwz0V_56HvU6_1JU4Y__WvulfwKvPwFvPYLl</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>733626193</pqid></control><display><type>article</type><title>Sequence features that drive human promoter function and tissue specificity</title><source>PubMed Central Free</source><source>MEDLINE</source><source>Alma/SFX Local Collection</source><creator>Landolin, Jane M ; Johnson, David S ; Trinklein, Nathan D ; Aldred, Shelly F ; Medina, Catherine ; Shulha, Hennady ; Weng, Zhiping ; Myers, Richard M</creator><creatorcontrib>Landolin, Jane M ; Johnson, David S ; Trinklein, Nathan D ; Aldred, Shelly F ; Medina, Catherine ; Shulha, Hennady ; Weng, Zhiping ; Myers, Richard M</creatorcontrib><description>Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.</description><identifier>ISSN: 1088-9051</identifier><identifier>EISSN: 1549-5469</identifier><identifier>DOI: 10.1101/gr.100370.109</identifier><identifier>PMID: 20501695</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Base Composition - physiology ; Base Sequence - physiology ; Binding Sites - genetics ; Cell Line ; Computational Biology - methods ; Epigenesis, Genetic - physiology ; Gene Expression - genetics ; Gene Expression - physiology ; Hep G2 Cells ; Hepatocyte Nuclear Factor 4 - genetics ; Humans ; Organ Specificity - genetics ; Promoter Regions, Genetic - genetics ; Promoter Regions, Genetic - physiology ; Protein Binding ; Transcription, Genetic ; Transfection</subject><ispartof>Genome research, 2010-07, Vol.20 (7), p.890-898</ispartof><rights>Copyright © 2010 by Cold Spring Harbor Laboratory Press</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c386t-4855526f8f50d0b0ad061c81541b719e21ce22e7def2b445fda3ead47425e8dd3</citedby><cites>FETCH-LOGICAL-c386t-4855526f8f50d0b0ad061c81541b719e21ce22e7def2b445fda3ead47425e8dd3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892090/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892090/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20501695$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Landolin, Jane M</creatorcontrib><creatorcontrib>Johnson, David S</creatorcontrib><creatorcontrib>Trinklein, Nathan D</creatorcontrib><creatorcontrib>Aldred, Shelly F</creatorcontrib><creatorcontrib>Medina, Catherine</creatorcontrib><creatorcontrib>Shulha, Hennady</creatorcontrib><creatorcontrib>Weng, Zhiping</creatorcontrib><creatorcontrib>Myers, Richard M</creatorcontrib><title>Sequence features that drive human promoter function and tissue specificity</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.</description><subject>Base Composition - physiology</subject><subject>Base Sequence - physiology</subject><subject>Binding Sites - genetics</subject><subject>Cell Line</subject><subject>Computational Biology - methods</subject><subject>Epigenesis, Genetic - physiology</subject><subject>Gene Expression - genetics</subject><subject>Gene Expression - physiology</subject><subject>Hep G2 Cells</subject><subject>Hepatocyte Nuclear Factor 4 - genetics</subject><subject>Humans</subject><subject>Organ Specificity - genetics</subject><subject>Promoter Regions, Genetic - genetics</subject><subject>Promoter Regions, Genetic - physiology</subject><subject>Protein Binding</subject><subject>Transcription, Genetic</subject><subject>Transfection</subject><issn>1088-9051</issn><issn>1549-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkM1LxDAQxYMofh-9Sm6eqpO0aZOLIOIXCh7Uc8gmk93Itl2TVPC_N7IqepoZ5sebN4-QIwanjAE7m8dTBlB3ZQS1QXaZaFQlmlZtlh6krBQItkP2UnqFwjVSbpMdDgJYq8QuuX_CtwkHi9SjyVPERPPCZOpieEe6mHoz0FUc-zFjpH4abA7jQM3gaA4pTUjTCm3wwYb8cUC2vFkmPPyu--Tl-ur58rZ6eLy5u7x4qGwt21w1UgjBWy-9AAczMA5aZmUxzmYdU8iZRc6xc-j5rGmEd6ZG45qu4QKlc_U-OV_rrqZZj87ikKNZ6lUMvYkfejRB_98MYaHn47vmUnFQUAROvgXiWL5PWfchWVwuzYDjlHRX1y1vmaoLWa1JG8eUIvrfKwz0V_56HvU6_1JU4Y__WvulfwKvPwFvPYLl</recordid><startdate>201007</startdate><enddate>201007</enddate><creator>Landolin, Jane M</creator><creator>Johnson, David S</creator><creator>Trinklein, Nathan D</creator><creator>Aldred, Shelly F</creator><creator>Medina, Catherine</creator><creator>Shulha, Hennady</creator><creator>Weng, Zhiping</creator><creator>Myers, Richard M</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201007</creationdate><title>Sequence features that drive human promoter function and tissue specificity</title><author>Landolin, Jane M ; Johnson, David S ; Trinklein, Nathan D ; Aldred, Shelly F ; Medina, Catherine ; Shulha, Hennady ; Weng, Zhiping ; Myers, Richard M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c386t-4855526f8f50d0b0ad061c81541b719e21ce22e7def2b445fda3ead47425e8dd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Base Composition - physiology</topic><topic>Base Sequence - physiology</topic><topic>Binding Sites - genetics</topic><topic>Cell Line</topic><topic>Computational Biology - methods</topic><topic>Epigenesis, Genetic - physiology</topic><topic>Gene Expression - genetics</topic><topic>Gene Expression - physiology</topic><topic>Hep G2 Cells</topic><topic>Hepatocyte Nuclear Factor 4 - genetics</topic><topic>Humans</topic><topic>Organ Specificity - genetics</topic><topic>Promoter Regions, Genetic - genetics</topic><topic>Promoter Regions, Genetic - physiology</topic><topic>Protein Binding</topic><topic>Transcription, Genetic</topic><topic>Transfection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Landolin, Jane M</creatorcontrib><creatorcontrib>Johnson, David S</creatorcontrib><creatorcontrib>Trinklein, Nathan D</creatorcontrib><creatorcontrib>Aldred, Shelly F</creatorcontrib><creatorcontrib>Medina, Catherine</creatorcontrib><creatorcontrib>Shulha, Hennady</creatorcontrib><creatorcontrib>Weng, Zhiping</creatorcontrib><creatorcontrib>Myers, Richard M</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Landolin, Jane M</au><au>Johnson, David S</au><au>Trinklein, Nathan D</au><au>Aldred, Shelly F</au><au>Medina, Catherine</au><au>Shulha, Hennady</au><au>Weng, Zhiping</au><au>Myers, Richard M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequence features that drive human promoter function and tissue specificity</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2010-07</date><risdate>2010</risdate><volume>20</volume><issue>7</issue><spage>890</spage><epage>898</epage><pages>890-898</pages><issn>1088-9051</issn><eissn>1549-5469</eissn><abstract>Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>20501695</pmid><doi>10.1101/gr.100370.109</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1088-9051
ispartof Genome research, 2010-07, Vol.20 (7), p.890-898
issn 1088-9051
1549-5469
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2892090
source PubMed Central Free; MEDLINE; Alma/SFX Local Collection
subjects Base Composition - physiology
Base Sequence - physiology
Binding Sites - genetics
Cell Line
Computational Biology - methods
Epigenesis, Genetic - physiology
Gene Expression - genetics
Gene Expression - physiology
Hep G2 Cells
Hepatocyte Nuclear Factor 4 - genetics
Humans
Organ Specificity - genetics
Promoter Regions, Genetic - genetics
Promoter Regions, Genetic - physiology
Protein Binding
Transcription, Genetic
Transfection
title Sequence features that drive human promoter function and tissue specificity
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T00%3A42%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequence%20features%20that%20drive%20human%20promoter%20function%20and%20tissue%20specificity&rft.jtitle=Genome%20research&rft.au=Landolin,%20Jane%20M&rft.date=2010-07&rft.volume=20&rft.issue=7&rft.spage=890&rft.epage=898&rft.pages=890-898&rft.issn=1088-9051&rft.eissn=1549-5469&rft_id=info:doi/10.1101/gr.100370.109&rft_dat=%3Cproquest_pubme%3E733626193%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=733626193&rft_id=info:pmid/20501695&rfr_iscdi=true