De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequ...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2010-06, Vol.38 (11), p.e126-e126 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e126 |
---|---|
container_issue | 11 |
container_start_page | e126 |
container_title | Nucleic acids research |
container_volume | 38 |
creator | Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P Delattre, Olivier Barillot, Emmanuel |
description | Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb). |
doi_str_mv | 10.1093/nar/gkq217 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2887977</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>733331806</sourcerecordid><originalsourceid>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</originalsourceid><addsrcrecordid>eNqFkktvEzEUhS0Eomlhww9A3iFVHeq3PRukKjxaKRJIwNpyPHcSw4yd2E6kLPjvTEipgA3e3MX9zvG99kHoBSWvKWn5dXT5evV9y6h-hGaUK9aIVrHHaEY4kQ0lwpyh81K-EUIFleIpOmOEa0nadoZ-vAUc0z7hMdXQ49BBnGrwroYUcRg3Oe2h4LoG7LzfZecPOPV4k6ELvoa4wjW7WHwOm1-K3vmaMl6G2B2bJdRJHSKer-8-NZ9hiztXHXbRDYcSyjP0pHdDgef39QJ9ff_uy_y2WXz8cDe_WTReUlYbYYABoe0Sei6V6AG0apmhAMANgx4okdLAUnupBfetm56F9kZw5VqvvOYX6M3Jd7NbjtD5acnsBrvJYXT5YJML9u9ODGu7SnvLjNGtPhpcnQzW_8hubxY2xAJ5tIQJbpRSezrhr-7vy2m7g1LtGIqHYXAR0q5YLaRmppX8_ySfDjVETeTlifQ5lZKhf5iDEnvMgZ1yYE85mOCXfy78gP7-eP4TC1mx1A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>733331806</pqid></control><display><type>article</type><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Access via Oxford University Press (Open Access Collection)</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</creator><creatorcontrib>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</creatorcontrib><description>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkq217</identifier><identifier>PMID: 20375099</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Base Sequence ; Binding Sites ; Biochemistry, Molecular Biology ; Cancer ; Cell Line, Tumor ; Chromatin Immunoprecipitation - methods ; Consensus Sequence ; Genomics ; Humans ; Life Sciences ; Methods Online ; Molecular biology ; Oncogene Proteins, Fusion - metabolism ; Proto-Oncogene Protein c-fli-1 - metabolism ; Regulatory Elements, Transcriptional ; RNA-Binding Protein EWS ; Sequence Analysis, DNA ; Transcription Factors - metabolism</subject><ispartof>Nucleic acids research, 2010-06, Vol.38 (11), p.e126-e126</ispartof><rights>Attribution - NoDerivatives</rights><rights>The Author(s) 2010. Published by Oxford University Press. 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</citedby><cites>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</cites><orcidid>0000-0003-4731-7817 ; 0000-0002-4382-7185 ; 0000-0002-7118-7859</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20375099$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://inserm.hal.science/inserm-02438666$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Boeva, Valentina</creatorcontrib><creatorcontrib>Surdez, Didier</creatorcontrib><creatorcontrib>Guillon, Noëlle</creatorcontrib><creatorcontrib>Tirode, Franck</creatorcontrib><creatorcontrib>Fejes, Anthony P</creatorcontrib><creatorcontrib>Delattre, Olivier</creatorcontrib><creatorcontrib>Barillot, Emmanuel</creatorcontrib><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Binding Sites</subject><subject>Biochemistry, Molecular Biology</subject><subject>Cancer</subject><subject>Cell Line, Tumor</subject><subject>Chromatin Immunoprecipitation - methods</subject><subject>Consensus Sequence</subject><subject>Genomics</subject><subject>Humans</subject><subject>Life Sciences</subject><subject>Methods Online</subject><subject>Molecular biology</subject><subject>Oncogene Proteins, Fusion - metabolism</subject><subject>Proto-Oncogene Protein c-fli-1 - metabolism</subject><subject>Regulatory Elements, Transcriptional</subject><subject>RNA-Binding Protein EWS</subject><subject>Sequence Analysis, DNA</subject><subject>Transcription Factors - metabolism</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkktvEzEUhS0Eomlhww9A3iFVHeq3PRukKjxaKRJIwNpyPHcSw4yd2E6kLPjvTEipgA3e3MX9zvG99kHoBSWvKWn5dXT5evV9y6h-hGaUK9aIVrHHaEY4kQ0lwpyh81K-EUIFleIpOmOEa0nadoZ-vAUc0z7hMdXQ49BBnGrwroYUcRg3Oe2h4LoG7LzfZecPOPV4k6ELvoa4wjW7WHwOm1-K3vmaMl6G2B2bJdRJHSKer-8-NZ9hiztXHXbRDYcSyjP0pHdDgef39QJ9ff_uy_y2WXz8cDe_WTReUlYbYYABoe0Sei6V6AG0apmhAMANgx4okdLAUnupBfetm56F9kZw5VqvvOYX6M3Jd7NbjtD5acnsBrvJYXT5YJML9u9ODGu7SnvLjNGtPhpcnQzW_8hubxY2xAJ5tIQJbpRSezrhr-7vy2m7g1LtGIqHYXAR0q5YLaRmppX8_ySfDjVETeTlifQ5lZKhf5iDEnvMgZ1yYE85mOCXfy78gP7-eP4TC1mx1A</recordid><startdate>20100601</startdate><enddate>20100601</enddate><creator>Boeva, Valentina</creator><creator>Surdez, Didier</creator><creator>Guillon, Noëlle</creator><creator>Tirode, Franck</creator><creator>Fejes, Anthony P</creator><creator>Delattre, Olivier</creator><creator>Barillot, Emmanuel</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4731-7817</orcidid><orcidid>https://orcid.org/0000-0002-4382-7185</orcidid><orcidid>https://orcid.org/0000-0002-7118-7859</orcidid></search><sort><creationdate>20100601</creationdate><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><author>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Binding Sites</topic><topic>Biochemistry, Molecular Biology</topic><topic>Cancer</topic><topic>Cell Line, Tumor</topic><topic>Chromatin Immunoprecipitation - methods</topic><topic>Consensus Sequence</topic><topic>Genomics</topic><topic>Humans</topic><topic>Life Sciences</topic><topic>Methods Online</topic><topic>Molecular biology</topic><topic>Oncogene Proteins, Fusion - metabolism</topic><topic>Proto-Oncogene Protein c-fli-1 - metabolism</topic><topic>Regulatory Elements, Transcriptional</topic><topic>RNA-Binding Protein EWS</topic><topic>Sequence Analysis, DNA</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Boeva, Valentina</creatorcontrib><creatorcontrib>Surdez, Didier</creatorcontrib><creatorcontrib>Guillon, Noëlle</creatorcontrib><creatorcontrib>Tirode, Franck</creatorcontrib><creatorcontrib>Fejes, Anthony P</creatorcontrib><creatorcontrib>Delattre, Olivier</creatorcontrib><creatorcontrib>Barillot, Emmanuel</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Boeva, Valentina</au><au>Surdez, Didier</au><au>Guillon, Noëlle</au><au>Tirode, Franck</au><au>Fejes, Anthony P</au><au>Delattre, Olivier</au><au>Barillot, Emmanuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2010-06-01</date><risdate>2010</risdate><volume>38</volume><issue>11</issue><spage>e126</spage><epage>e126</epage><pages>e126-e126</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>20375099</pmid><doi>10.1093/nar/gkq217</doi><orcidid>https://orcid.org/0000-0003-4731-7817</orcidid><orcidid>https://orcid.org/0000-0002-4382-7185</orcidid><orcidid>https://orcid.org/0000-0002-7118-7859</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2010-06, Vol.38 (11), p.e126-e126 |
issn | 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2887977 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Access via Oxford University Press (Open Access Collection); PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Base Sequence Binding Sites Biochemistry, Molecular Biology Cancer Cell Line, Tumor Chromatin Immunoprecipitation - methods Consensus Sequence Genomics Humans Life Sciences Methods Online Molecular biology Oncogene Proteins, Fusion - metabolism Proto-Oncogene Protein c-fli-1 - metabolism Regulatory Elements, Transcriptional RNA-Binding Protein EWS Sequence Analysis, DNA Transcription Factors - metabolism |
title | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T05%3A03%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=De%20novo%20motif%20identification%20improves%20the%20accuracy%20of%20predicting%20transcription%20factor%20binding%20sites%20in%20ChIP-Seq%20data%20analysis&rft.jtitle=Nucleic%20acids%20research&rft.au=Boeva,%20Valentina&rft.date=2010-06-01&rft.volume=38&rft.issue=11&rft.spage=e126&rft.epage=e126&rft.pages=e126-e126&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkq217&rft_dat=%3Cproquest_pubme%3E733331806%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=733331806&rft_id=info:pmid/20375099&rfr_iscdi=true |