Identification of the Human DPR Promoter Element by using Machine Learning

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation 1 - 5 , but the downstream core promoter in humans has been difficult to decipher 1 - 3 . Here, we analyze the human Pol II core promoter and use machine learning to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature (London) 2020-09, Vol.585 (7825), p.459-463
Hauptverfasser: Vo ngoc, Long, Huang, Cassidy Yunjing, Cassidy, California Jack, Medrano, Claudia, Kadonaga, James T.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 463
container_issue 7825
container_start_page 459
container_title Nature (London)
container_volume 585
creator Vo ngoc, Long
Huang, Cassidy Yunjing
Cassidy, California Jack
Medrano, Claudia
Kadonaga, James T.
description The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation 1 - 5 , but the downstream core promoter in humans has been difficult to decipher 1 - 3 . Here, we analyze the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants that are each of known transcriptional strength. We then analyzed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These studies revealed that the DPR is a functionally important core promoter element that is widely used in human promoters. Importantly, there appears to be a duality between the DPR and TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.
doi_str_mv 10.1038/s41586-020-2689-7
format Article
fullrecord <record><control><sourceid>pubmedcentral</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7501168</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>pubmedcentral_primary_oai_pubmedcentral_nih_gov_7501168</sourcerecordid><originalsourceid>FETCH-pubmedcentral_primary_oai_pubmedcentral_nih_gov_75011683</originalsourceid><addsrcrecordid>eNqljMtOwzAURK8QiIbHB7C7P2C4zsN2N2ygqCCQKsTeclOnMYrtyk6Q-vdkwaZrViOdOTMAd5zuOVXqIde8UYJRSawUasnkGRS8loLVQslzKIhKxUhVYgFXOX8TUcNlfQmLqlzOmJoC3l53Noyuc60ZXQwYOxx7i-vJm4DPm0_cpOjjaBOuButnFbdHnLILe_wwbe-CxXdrUpjBDVx0Zsj29i-v4fFl9fW0Zodp6-2uncfJDPqQnDfpqKNx-rQJrtf7-KNlQ5wLVf374BduUVt9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identification of the Human DPR Promoter Element by using Machine Learning</title><source>Nature</source><source>SpringerLink Journals - AutoHoldings</source><creator>Vo ngoc, Long ; Huang, Cassidy Yunjing ; Cassidy, California Jack ; Medrano, Claudia ; Kadonaga, James T.</creator><creatorcontrib>Vo ngoc, Long ; Huang, Cassidy Yunjing ; Cassidy, California Jack ; Medrano, Claudia ; Kadonaga, James T.</creatorcontrib><description>The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation 1 - 5 , but the downstream core promoter in humans has been difficult to decipher 1 - 3 . Here, we analyze the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants that are each of known transcriptional strength. We then analyzed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These studies revealed that the DPR is a functionally important core promoter element that is widely used in human promoters. Importantly, there appears to be a duality between the DPR and TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.</description><identifier>ISSN: 0028-0836</identifier><identifier>EISSN: 1476-4687</identifier><identifier>DOI: 10.1038/s41586-020-2689-7</identifier><identifier>PMID: 32908305</identifier><language>eng</language><ispartof>Nature (London), 2020-09, Vol.585 (7825), p.459-463</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27903,27904</link.rule.ids></links><search><creatorcontrib>Vo ngoc, Long</creatorcontrib><creatorcontrib>Huang, Cassidy Yunjing</creatorcontrib><creatorcontrib>Cassidy, California Jack</creatorcontrib><creatorcontrib>Medrano, Claudia</creatorcontrib><creatorcontrib>Kadonaga, James T.</creatorcontrib><title>Identification of the Human DPR Promoter Element by using Machine Learning</title><title>Nature (London)</title><description>The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation 1 - 5 , but the downstream core promoter in humans has been difficult to decipher 1 - 3 . Here, we analyze the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants that are each of known transcriptional strength. We then analyzed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These studies revealed that the DPR is a functionally important core promoter element that is widely used in human promoters. Importantly, there appears to be a duality between the DPR and TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.</description><issn>0028-0836</issn><issn>1476-4687</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNqljMtOwzAURK8QiIbHB7C7P2C4zsN2N2ygqCCQKsTeclOnMYrtyk6Q-vdkwaZrViOdOTMAd5zuOVXqIde8UYJRSawUasnkGRS8loLVQslzKIhKxUhVYgFXOX8TUcNlfQmLqlzOmJoC3l53Noyuc60ZXQwYOxx7i-vJm4DPm0_cpOjjaBOuButnFbdHnLILe_wwbe-CxXdrUpjBDVx0Zsj29i-v4fFl9fW0Zodp6-2uncfJDPqQnDfpqKNx-rQJrtf7-KNlQ5wLVf374BduUVt9</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Vo ngoc, Long</creator><creator>Huang, Cassidy Yunjing</creator><creator>Cassidy, California Jack</creator><creator>Medrano, Claudia</creator><creator>Kadonaga, James T.</creator><scope>5PM</scope></search><sort><creationdate>20200901</creationdate><title>Identification of the Human DPR Promoter Element by using Machine Learning</title><author>Vo ngoc, Long ; Huang, Cassidy Yunjing ; Cassidy, California Jack ; Medrano, Claudia ; Kadonaga, James T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-pubmedcentral_primary_oai_pubmedcentral_nih_gov_75011683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vo ngoc, Long</creatorcontrib><creatorcontrib>Huang, Cassidy Yunjing</creatorcontrib><creatorcontrib>Cassidy, California Jack</creatorcontrib><creatorcontrib>Medrano, Claudia</creatorcontrib><creatorcontrib>Kadonaga, James T.</creatorcontrib><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nature (London)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vo ngoc, Long</au><au>Huang, Cassidy Yunjing</au><au>Cassidy, California Jack</au><au>Medrano, Claudia</au><au>Kadonaga, James T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identification of the Human DPR Promoter Element by using Machine Learning</atitle><jtitle>Nature (London)</jtitle><date>2020-09-01</date><risdate>2020</risdate><volume>585</volume><issue>7825</issue><spage>459</spage><epage>463</epage><pages>459-463</pages><issn>0028-0836</issn><eissn>1476-4687</eissn><abstract>The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation 1 - 5 , but the downstream core promoter in humans has been difficult to decipher 1 - 3 . Here, we analyze the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants that are each of known transcriptional strength. We then analyzed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These studies revealed that the DPR is a functionally important core promoter element that is widely used in human promoters. Importantly, there appears to be a duality between the DPR and TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.</abstract><pmid>32908305</pmid><doi>10.1038/s41586-020-2689-7</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0028-0836
ispartof Nature (London), 2020-09, Vol.585 (7825), p.459-463
issn 0028-0836
1476-4687
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7501168
source Nature; SpringerLink Journals - AutoHoldings
title Identification of the Human DPR Promoter Element by using Machine Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T19%3A29%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmedcentral&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identification%20of%20the%20Human%20DPR%20Promoter%20Element%20by%20using%20Machine%20Learning&rft.jtitle=Nature%20(London)&rft.au=Vo%20ngoc,%20Long&rft.date=2020-09-01&rft.volume=585&rft.issue=7825&rft.spage=459&rft.epage=463&rft.pages=459-463&rft.issn=0028-0836&rft.eissn=1476-4687&rft_id=info:doi/10.1038/s41586-020-2689-7&rft_dat=%3Cpubmedcentral%3Epubmedcentral_primary_oai_pubmedcentral_nih_gov_7501168%3C/pubmedcentral%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/32908305&rfr_iscdi=true