A systematic, large-scale comparison of transcription factor binding site models
The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are...
Gespeichert in:
Veröffentlicht in: | BMC genomics 2016-05, Vol.17 (373), p.388-388, Article 388 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 388 |
---|---|
container_issue | 373 |
container_start_page | 388 |
container_title | BMC genomics |
container_volume | 17 |
creator | Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N Schuelke, Markus Seelow, Dominik |
description | The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.
While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs.
Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. |
doi_str_mv | 10.1186/s12864-016-2729-8 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4875604</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A452895132</galeid><sourcerecordid>A452895132</sourcerecordid><originalsourceid>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</originalsourceid><addsrcrecordid>eNptklFrHCEUhaU0JGmaH9CXMtCXFDqpzujovASW0DaBQEubPIvrXKeGGd143ZL8-7hsGrIQFBT97uF6PIR8YPSUMdV9RdaojteUdXUjm75Wb8gh45LVDev42xf7A_IO8ZZSJlUj9slBoWlf5iH5tajwATPMJnv7pZpMGqFGayaobJxXJnmMoYquyskEtMmvsi8HztgcU7X0YfBhrNBnqOY4wITvyZ4zE8Lx03pEbr5_uz6_qK9-_rg8X1zVVlCa60FaMwyWg1OspUYI2xgjS1tCuk5RpyQD50RbYCeWToJSTgzc8F6wpbWqPSJnW93VejnDYCGUDie9Sn426UFH4_XuTfB_9Rj_aa6k6CgvAidPAinerQGznj1amCYTIK5RM9kXjzjvWUE_bdGx-KJ9cLEo2g2uF1w0qvTUNoU6fYUqY4DZ2xjA-XK-U_B5p6AwGe7zaNaI-vLP712WbVmbImIC9_xSRvUmDHobBl3CoDdh0BuLPr606Lni_--3jw5or3g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1790924491</pqid></control><display><type>article</type><title>A systematic, large-scale comparison of transcription factor binding site models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>SpringerLink Journals - AutoHoldings</source><creator>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</creator><creatorcontrib>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</creatorcontrib><description>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.
While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs.
Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/s12864-016-2729-8</identifier><identifier>PMID: 27209209</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Analysis ; Binding Sites ; Computational Biology ; Genetic regulation ; Mutation ; Polymorphism, Single Nucleotide ; Protein binding ; Transcription factors ; Transcription Factors - chemistry ; Transcription Factors - genetics ; Transcription Factors - metabolism</subject><ispartof>BMC genomics, 2016-05, Vol.17 (373), p.388-388, Article 388</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Hombach et al. 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</citedby><cites>FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27209209$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hombach, Daniela</creatorcontrib><creatorcontrib>Schwarz, Jana Marie</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Schuelke, Markus</creatorcontrib><creatorcontrib>Seelow, Dominik</creatorcontrib><title>A systematic, large-scale comparison of transcription factor binding site models</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.
While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs.
Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</description><subject>Analysis</subject><subject>Binding Sites</subject><subject>Computational Biology</subject><subject>Genetic regulation</subject><subject>Mutation</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Protein binding</subject><subject>Transcription factors</subject><subject>Transcription Factors - chemistry</subject><subject>Transcription Factors - genetics</subject><subject>Transcription Factors - metabolism</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNptklFrHCEUhaU0JGmaH9CXMtCXFDqpzujovASW0DaBQEubPIvrXKeGGd143ZL8-7hsGrIQFBT97uF6PIR8YPSUMdV9RdaojteUdXUjm75Wb8gh45LVDev42xf7A_IO8ZZSJlUj9slBoWlf5iH5tajwATPMJnv7pZpMGqFGayaobJxXJnmMoYquyskEtMmvsi8HztgcU7X0YfBhrNBnqOY4wITvyZ4zE8Lx03pEbr5_uz6_qK9-_rg8X1zVVlCa60FaMwyWg1OspUYI2xgjS1tCuk5RpyQD50RbYCeWToJSTgzc8F6wpbWqPSJnW93VejnDYCGUDie9Sn426UFH4_XuTfB_9Rj_aa6k6CgvAidPAinerQGznj1amCYTIK5RM9kXjzjvWUE_bdGx-KJ9cLEo2g2uF1w0qvTUNoU6fYUqY4DZ2xjA-XK-U_B5p6AwGe7zaNaI-vLP712WbVmbImIC9_xSRvUmDHobBl3CoDdh0BuLPr606Lni_--3jw5or3g</recordid><startdate>20160521</startdate><enddate>20160521</enddate><creator>Hombach, Daniela</creator><creator>Schwarz, Jana Marie</creator><creator>Robinson, Peter N</creator><creator>Schuelke, Markus</creator><creator>Seelow, Dominik</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160521</creationdate><title>A systematic, large-scale comparison of transcription factor binding site models</title><author>Hombach, Daniela ; Schwarz, Jana Marie ; Robinson, Peter N ; Schuelke, Markus ; Seelow, Dominik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c500t-d7caddc4ef8130a55c2aa772057f680f871eff53500f5bf7e88f5d4a4951bcc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Analysis</topic><topic>Binding Sites</topic><topic>Computational Biology</topic><topic>Genetic regulation</topic><topic>Mutation</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Protein binding</topic><topic>Transcription factors</topic><topic>Transcription Factors - chemistry</topic><topic>Transcription Factors - genetics</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hombach, Daniela</creatorcontrib><creatorcontrib>Schwarz, Jana Marie</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Schuelke, Markus</creatorcontrib><creatorcontrib>Seelow, Dominik</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hombach, Daniela</au><au>Schwarz, Jana Marie</au><au>Robinson, Peter N</au><au>Schuelke, Markus</au><au>Seelow, Dominik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A systematic, large-scale comparison of transcription factor binding site models</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2016-05-21</date><risdate>2016</risdate><volume>17</volume><issue>373</issue><spage>388</spage><epage>388</epage><pages>388-388</pages><artnum>388</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.
While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs.
Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27209209</pmid><doi>10.1186/s12864-016-2729-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2164 |
ispartof | BMC genomics, 2016-05, Vol.17 (373), p.388-388, Article 388 |
issn | 1471-2164 1471-2164 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4875604 |
source | MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central Open Access; Springer Nature OA Free Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; SpringerLink Journals - AutoHoldings |
subjects | Analysis Binding Sites Computational Biology Genetic regulation Mutation Polymorphism, Single Nucleotide Protein binding Transcription factors Transcription Factors - chemistry Transcription Factors - genetics Transcription Factors - metabolism |
title | A systematic, large-scale comparison of transcription factor binding site models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T18%3A47%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20systematic,%20large-scale%20comparison%20of%20transcription%20factor%20binding%20site%20models&rft.jtitle=BMC%20genomics&rft.au=Hombach,%20Daniela&rft.date=2016-05-21&rft.volume=17&rft.issue=373&rft.spage=388&rft.epage=388&rft.pages=388-388&rft.artnum=388&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/s12864-016-2729-8&rft_dat=%3Cgale_pubme%3EA452895132%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1790924491&rft_id=info:pmid/27209209&rft_galeid=A452895132&rfr_iscdi=true |