A systematic, large-scale comparison of transcription factor binding site models

The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC genomics 2016-05, Vol.17 (373), p.388-388, Article 388
Hauptverfasser: Hombach, Daniela, Schwarz, Jana Marie, Robinson, Peter N, Schuelke, Markus, Seelow, Dominik
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 388
container_issue 373
container_start_page 388
container_title BMC genomics
container_volume 17
creator Hombach, Daniela
Schwarz, Jana Marie
Robinson, Peter N
Schuelke, Markus
Seelow, Dominik
description The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.
doi_str_mv 10.1186/s12864-016-2729-8
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4875604</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A452895132</galeid><sourcerecordid>A452895132</sourcerecordid><originalsourceid>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</originalsourceid><addsrcrecordid>eNptklFrHCEUhaU0JGmaH9CXMtCXFDqpzujovASW0DaBQEubPIvrXKeGGd143ZL8-7hsGrIQFBT97uF6PIR8YPSUMdV9RdaojteUdXUjm75Wb8gh45LVDev42xf7A_IO8ZZSJlUj9slBoWlf5iH5tajwATPMJnv7pZpMGqFGayaobJxXJnmMoYquyskEtMmvsi8HztgcU7X0YfBhrNBnqOY4wITvyZ4zE8Lx03pEbr5_uz6_qK9-_rg8X1zVVlCa60FaMwyWg1OspUYI2xgjS1tCuk5RpyQD50RbYCeWToJSTgzc8F6wpbWqPSJnW93VejnDYCGUDie9Sn426UFH4_XuTfB_9Rj_aa6k6CgvAidPAinerQGznj1amCYTIK5RM9kXjzjvWUE_bdGx-KJ9cLEo2g2uF1w0qvTUNoU6fYUqY4DZ2xjA-XK-U_B5p6AwGe7zaNaI-vLP712WbVmbImIC9_xSRvUmDHobBl3CoDdh0BuLPr606Lni_--3jw5or3g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1790924491</pqid></control><display><type>article</type><title>A systematic, large-scale comparison of transcription factor binding site models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>SpringerLink Journals - AutoHoldings</source><creator>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</creator><creatorcontrib>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</creatorcontrib><description>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/s12864-016-2729-8</identifier><identifier>PMID: 27209209</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Analysis ; Binding Sites ; Computational Biology ; Genetic regulation ; Mutation ; Polymorphism, Single Nucleotide ; Protein binding ; Transcription factors ; Transcription Factors - chemistry ; Transcription Factors - genetics ; Transcription Factors - metabolism</subject><ispartof>BMC genomics, 2016-05, Vol.17 (373), p.388-388, Article 388</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Hombach et al. 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</citedby><cites>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27209209$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hombach, Daniela</creatorcontrib><creatorcontrib>Schwarz, Jana Marie</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Schuelke, Markus</creatorcontrib><creatorcontrib>Seelow, Dominik</creatorcontrib><title>A systematic, large-scale comparison of transcription factor binding site models</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</description><subject>Analysis</subject><subject>Binding Sites</subject><subject>Computational Biology</subject><subject>Genetic regulation</subject><subject>Mutation</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Protein binding</subject><subject>Transcription factors</subject><subject>Transcription Factors - chemistry</subject><subject>Transcription Factors - genetics</subject><subject>Transcription Factors - metabolism</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNptklFrHCEUhaU0JGmaH9CXMtCXFDqpzujovASW0DaBQEubPIvrXKeGGd143ZL8-7hsGrIQFBT97uF6PIR8YPSUMdV9RdaojteUdXUjm75Wb8gh45LVDev42xf7A_IO8ZZSJlUj9slBoWlf5iH5tajwATPMJnv7pZpMGqFGayaobJxXJnmMoYquyskEtMmvsi8HztgcU7X0YfBhrNBnqOY4wITvyZ4zE8Lx03pEbr5_uz6_qK9-_rg8X1zVVlCa60FaMwyWg1OspUYI2xgjS1tCuk5RpyQD50RbYCeWToJSTgzc8F6wpbWqPSJnW93VejnDYCGUDie9Sn426UFH4_XuTfB_9Rj_aa6k6CgvAidPAinerQGznj1amCYTIK5RM9kXjzjvWUE_bdGx-KJ9cLEo2g2uF1w0qvTUNoU6fYUqY4DZ2xjA-XK-U_B5p6AwGe7zaNaI-vLP712WbVmbImIC9_xSRvUmDHobBl3CoDdh0BuLPr606Lni_--3jw5or3g</recordid><startdate>20160521</startdate><enddate>20160521</enddate><creator>Hombach, Daniela</creator><creator>Schwarz, Jana Marie</creator><creator>Robinson, Peter N</creator><creator>Schuelke, Markus</creator><creator>Seelow, Dominik</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160521</creationdate><title>A systematic, large-scale comparison of transcription factor binding site models</title><author>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Analysis</topic><topic>Binding Sites</topic><topic>Computational Biology</topic><topic>Genetic regulation</topic><topic>Mutation</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Protein binding</topic><topic>Transcription factors</topic><topic>Transcription Factors - chemistry</topic><topic>Transcription Factors - genetics</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hombach, Daniela</creatorcontrib><creatorcontrib>Schwarz, Jana Marie</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Schuelke, Markus</creatorcontrib><creatorcontrib>Seelow, Dominik</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hombach, Daniela</au><au>Schwarz, Jana Marie</au><au>Robinson, Peter N</au><au>Schuelke, Markus</au><au>Seelow, Dominik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A systematic, large-scale comparison of transcription factor binding site models</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2016-05-21</date><risdate>2016</risdate><volume>17</volume><issue>373</issue><spage>388</spage><epage>388</epage><pages>388-388</pages><artnum>388</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27209209</pmid><doi>10.1186/s12864-016-2729-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2164
ispartof BMC genomics, 2016-05, Vol.17 (373), p.388-388, Article 388
issn 1471-2164
1471-2164
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4875604
source MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central Open Access; Springer Nature OA Free Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; SpringerLink Journals - AutoHoldings
subjects Analysis
Binding Sites
Computational Biology
Genetic regulation
Mutation
Polymorphism, Single Nucleotide
Protein binding
Transcription factors
Transcription Factors - chemistry
Transcription Factors - genetics
Transcription Factors - metabolism
title A systematic, large-scale comparison of transcription factor binding site models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T18%3A47%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20systematic,%20large-scale%20comparison%20of%20transcription%20factor%20binding%20site%20models&rft.jtitle=BMC%20genomics&rft.au=Hombach,%20Daniela&rft.date=2016-05-21&rft.volume=17&rft.issue=373&rft.spage=388&rft.epage=388&rft.pages=388-388&rft.artnum=388&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/s12864-016-2729-8&rft_dat=%3Cgale_pubme%3EA452895132%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1790924491&rft_id=info:pmid/27209209&rft_galeid=A452895132&rfr_iscdi=true