Stability selection for regression-based models of transcription factor-DNA binding specificity

The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2013-07, Vol.29 (13), p.i117-i125
Hauptverfasser:	Mordelet, Fantine, Horton, John, Hartemink, Alexander J, Engelhardt, Barbara E, Gordân, Raluca
Format:	Artikel
Sprache:	eng
Schlagworte:	Affinity Algorithms Binding Binding Sites Bioinformatics Deoxyribonucleic acid DNA - chemistry DNA - metabolism Genome Human Humans Linear Models Mathematical models Protein Array Analysis Protein Binding Regression Saccharomyces cerevisiae Proteins - metabolism Support Vector Machine Transcription Factors - metabolism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	i125
container_issue	13
container_start_page	i117
container_title	Bioinformatics
container_volume	29
creator	Mordelet, Fantine Horton, John Hartemink, Alexander J Engelhardt, Barbara E Gordân, Raluca
description	The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.
doi_str_mv	10.1093/bioinformatics/btt221
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3694650</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1412559999</sourcerecordid><originalsourceid>FETCH-LOGICAL-c477t-85de22417a2bd46b6166ecfe83bfe74f771e1009a7974e4035d7fdca208f15b03</originalsourceid><addsrcrecordid>eNqFkU1P3DAQhi1UxFf5CUU59pLi8WdyQUIU2kqoHNqeLdsZb42SeLG9SPz7Bi2syglfxtY882isl5BPQL8A7fm5iynOIeXJ1ujLuauVMdgjR8CVbkUH8GF3p_yQHJdyTymVVKoDcsh4B6zX8oiYX9W6OMb61BQc0deY5mbRNhlXGUtZnq2zBYdmSgOOpUmhqdnOxee43sLW15Tbrz8vGxfnIc6rpqzRxxD9Yv1I9oMdC56-1BPy5-b699X39vbu24-ry9vWC61r28kBGROgLXODUE6BUugDdtwF1CJoDQiU9lb3WqCgXA46DN4y2gWQjvITcrH1rjduwsHjvGw5mnWOk81PJtlo3nbm-Nes0qPhqhdKPgs-vwhyethgqWaKxeM42hnTphhQGqQC1vH3UQFMyn4576Ncc8E5lXpB5Rb1OZWSMeyWB2qeIzdvIzfbyJe5s_9_vpt6zZj_AwZir5s</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1373433057</pqid></control><display><type>article</type><title>Stability selection for regression-based models of transcription factor-DNA binding specificity</title><source>MEDLINE</source><source>Access via Oxford University Press (Open Access Collection)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Mordelet, Fantine ; Horton, John ; Hartemink, Alexander J ; Engelhardt, Barbara E ; Gordân, Raluca</creator><creatorcontrib>Mordelet, Fantine ; Horton, John ; Hartemink, Alexander J ; Engelhardt, Barbara E ; Gordân, Raluca</creatorcontrib><description>The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/btt221</identifier><identifier>PMID: 23812975</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Affinity ; Algorithms ; Binding ; Binding Sites ; Bioinformatics ; Deoxyribonucleic acid ; DNA - chemistry ; DNA - metabolism ; Genome ; Human ; Humans ; Linear Models ; Mathematical models ; Protein Array Analysis ; Protein Binding ; Regression ; Saccharomyces cerevisiae Proteins - metabolism ; Support Vector Machine ; Transcription Factors - metabolism</subject><ispartof>Bioinformatics, 2013-07, Vol.29 (13), p.i117-i125</ispartof><rights>The Author 2013. Published by Oxford University Press. 2013</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c477t-85de22417a2bd46b6166ecfe83bfe74f771e1009a7974e4035d7fdca208f15b03</citedby><cites>FETCH-LOGICAL-c477t-85de22417a2bd46b6166ecfe83bfe74f771e1009a7974e4035d7fdca208f15b03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694650/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694650/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23812975$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mordelet, Fantine</creatorcontrib><creatorcontrib>Horton, John</creatorcontrib><creatorcontrib>Hartemink, Alexander J</creatorcontrib><creatorcontrib>Engelhardt, Barbara E</creatorcontrib><creatorcontrib>Gordân, Raluca</creatorcontrib><title>Stability selection for regression-based models of transcription factor-DNA binding specificity</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.</description><subject>Affinity</subject><subject>Algorithms</subject><subject>Binding</subject><subject>Binding Sites</subject><subject>Bioinformatics</subject><subject>Deoxyribonucleic acid</subject><subject>DNA - chemistry</subject><subject>DNA - metabolism</subject><subject>Genome</subject><subject>Human</subject><subject>Humans</subject><subject>Linear Models</subject><subject>Mathematical models</subject><subject>Protein Array Analysis</subject><subject>Protein Binding</subject><subject>Regression</subject><subject>Saccharomyces cerevisiae Proteins - metabolism</subject><subject>Support Vector Machine</subject><subject>Transcription Factors - metabolism</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU1P3DAQhi1UxFf5CUU59pLi8WdyQUIU2kqoHNqeLdsZb42SeLG9SPz7Bi2syglfxtY882isl5BPQL8A7fm5iynOIeXJ1ujLuauVMdgjR8CVbkUH8GF3p_yQHJdyTymVVKoDcsh4B6zX8oiYX9W6OMb61BQc0deY5mbRNhlXGUtZnq2zBYdmSgOOpUmhqdnOxee43sLW15Tbrz8vGxfnIc6rpqzRxxD9Yv1I9oMdC56-1BPy5-b699X39vbu24-ry9vWC61r28kBGROgLXODUE6BUugDdtwF1CJoDQiU9lb3WqCgXA46DN4y2gWQjvITcrH1rjduwsHjvGw5mnWOk81PJtlo3nbm-Nes0qPhqhdKPgs-vwhyethgqWaKxeM42hnTphhQGqQC1vH3UQFMyn4576Ncc8E5lXpB5Rb1OZWSMeyWB2qeIzdvIzfbyJe5s_9_vpt6zZj_AwZir5s</recordid><startdate>20130701</startdate><enddate>20130701</enddate><creator>Mordelet, Fantine</creator><creator>Horton, John</creator><creator>Hartemink, Alexander J</creator><creator>Engelhardt, Barbara E</creator><creator>Gordân, Raluca</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>5PM</scope></search><sort><creationdate>20130701</creationdate><title>Stability selection for regression-based models of transcription factor-DNA binding specificity</title><author>Mordelet, Fantine ; Horton, John ; Hartemink, Alexander J ; Engelhardt, Barbara E ; Gordân, Raluca</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c477t-85de22417a2bd46b6166ecfe83bfe74f771e1009a7974e4035d7fdca208f15b03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Affinity</topic><topic>Algorithms</topic><topic>Binding</topic><topic>Binding Sites</topic><topic>Bioinformatics</topic><topic>Deoxyribonucleic acid</topic><topic>DNA - chemistry</topic><topic>DNA - metabolism</topic><topic>Genome</topic><topic>Human</topic><topic>Humans</topic><topic>Linear Models</topic><topic>Mathematical models</topic><topic>Protein Array Analysis</topic><topic>Protein Binding</topic><topic>Regression</topic><topic>Saccharomyces cerevisiae Proteins - metabolism</topic><topic>Support Vector Machine</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mordelet, Fantine</creatorcontrib><creatorcontrib>Horton, John</creatorcontrib><creatorcontrib>Hartemink, Alexander J</creatorcontrib><creatorcontrib>Engelhardt, Barbara E</creatorcontrib><creatorcontrib>Gordân, Raluca</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mordelet, Fantine</au><au>Horton, John</au><au>Hartemink, Alexander J</au><au>Engelhardt, Barbara E</au><au>Gordân, Raluca</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Stability selection for regression-based models of transcription factor-DNA binding specificity</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2013-07-01</date><risdate>2013</risdate><volume>29</volume><issue>13</issue><spage>i117</spage><epage>i125</epage><pages>i117-i125</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><eissn>1460-2059</eissn><abstract>The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>23812975</pmid><doi>10.1093/bioinformatics/btt221</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics, 2013-07, Vol.29 (13), p.i117-i125
issn	1367-4803 1367-4811 1460-2059
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3694650
source	MEDLINE; Access via Oxford University Press (Open Access Collection); EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects	Affinity Algorithms Binding Binding Sites Bioinformatics Deoxyribonucleic acid DNA - chemistry DNA - metabolism Genome Human Humans Linear Models Mathematical models Protein Array Analysis Protein Binding Regression Saccharomyces cerevisiae Proteins - metabolism Support Vector Machine Transcription Factors - metabolism
title	Stability selection for regression-based models of transcription factor-DNA binding specificity
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T23%3A04%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Stability%20selection%20for%20regression-based%20models%20of%20transcription%20factor-DNA%20binding%20specificity&rft.jtitle=Bioinformatics&rft.au=Mordelet,%20Fantine&rft.date=2013-07-01&rft.volume=29&rft.issue=13&rft.spage=i117&rft.epage=i125&rft.pages=i117-i125&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btt221&rft_dat=%3Cproquest_pubme%3E1412559999%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1373433057&rft_id=info:pmid/23812975&rfr_iscdi=true