De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2010-06, Vol.38 (11), p.e126-e126
Hauptverfasser: Boeva, Valentina, Surdez, Didier, Guillon, Noëlle, Tirode, Franck, Fejes, Anthony P, Delattre, Olivier, Barillot, Emmanuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e126
container_issue 11
container_start_page e126
container_title Nucleic acids research
container_volume 38
creator Boeva, Valentina
Surdez, Didier
Guillon, Noëlle
Tirode, Franck
Fejes, Anthony P
Delattre, Olivier
Barillot, Emmanuel
description Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).
doi_str_mv 10.1093/nar/gkq217
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2887977</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>733331806</sourcerecordid><originalsourceid>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</originalsourceid><addsrcrecordid>eNqFkktvEzEUhS0Eomlhww9A3iFVHeq3PRukKjxaKRJIwNpyPHcSw4yd2E6kLPjvTEipgA3e3MX9zvG99kHoBSWvKWn5dXT5evV9y6h-hGaUK9aIVrHHaEY4kQ0lwpyh81K-EUIFleIpOmOEa0nadoZ-vAUc0z7hMdXQ49BBnGrwroYUcRg3Oe2h4LoG7LzfZecPOPV4k6ELvoa4wjW7WHwOm1-K3vmaMl6G2B2bJdRJHSKer-8-NZ9hiztXHXbRDYcSyjP0pHdDgef39QJ9ff_uy_y2WXz8cDe_WTReUlYbYYABoe0Sei6V6AG0apmhAMANgx4okdLAUnupBfetm56F9kZw5VqvvOYX6M3Jd7NbjtD5acnsBrvJYXT5YJML9u9ODGu7SnvLjNGtPhpcnQzW_8hubxY2xAJ5tIQJbpRSezrhr-7vy2m7g1LtGIqHYXAR0q5YLaRmppX8_ySfDjVETeTlifQ5lZKhf5iDEnvMgZ1yYE85mOCXfy78gP7-eP4TC1mx1A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>733331806</pqid></control><display><type>article</type><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Access via Oxford University Press (Open Access Collection)</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</creator><creatorcontrib>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</creatorcontrib><description>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered &gt;2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkq217</identifier><identifier>PMID: 20375099</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Base Sequence ; Binding Sites ; Biochemistry, Molecular Biology ; Cancer ; Cell Line, Tumor ; Chromatin Immunoprecipitation - methods ; Consensus Sequence ; Genomics ; Humans ; Life Sciences ; Methods Online ; Molecular biology ; Oncogene Proteins, Fusion - metabolism ; Proto-Oncogene Protein c-fli-1 - metabolism ; Regulatory Elements, Transcriptional ; RNA-Binding Protein EWS ; Sequence Analysis, DNA ; Transcription Factors - metabolism</subject><ispartof>Nucleic acids research, 2010-06, Vol.38 (11), p.e126-e126</ispartof><rights>Attribution - NoDerivatives</rights><rights>The Author(s) 2010. Published by Oxford University Press. 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</citedby><cites>FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</cites><orcidid>0000-0003-4731-7817 ; 0000-0002-4382-7185 ; 0000-0002-7118-7859</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20375099$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://inserm.hal.science/inserm-02438666$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Boeva, Valentina</creatorcontrib><creatorcontrib>Surdez, Didier</creatorcontrib><creatorcontrib>Guillon, Noëlle</creatorcontrib><creatorcontrib>Tirode, Franck</creatorcontrib><creatorcontrib>Fejes, Anthony P</creatorcontrib><creatorcontrib>Delattre, Olivier</creatorcontrib><creatorcontrib>Barillot, Emmanuel</creatorcontrib><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered &gt;2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Binding Sites</subject><subject>Biochemistry, Molecular Biology</subject><subject>Cancer</subject><subject>Cell Line, Tumor</subject><subject>Chromatin Immunoprecipitation - methods</subject><subject>Consensus Sequence</subject><subject>Genomics</subject><subject>Humans</subject><subject>Life Sciences</subject><subject>Methods Online</subject><subject>Molecular biology</subject><subject>Oncogene Proteins, Fusion - metabolism</subject><subject>Proto-Oncogene Protein c-fli-1 - metabolism</subject><subject>Regulatory Elements, Transcriptional</subject><subject>RNA-Binding Protein EWS</subject><subject>Sequence Analysis, DNA</subject><subject>Transcription Factors - metabolism</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkktvEzEUhS0Eomlhww9A3iFVHeq3PRukKjxaKRJIwNpyPHcSw4yd2E6kLPjvTEipgA3e3MX9zvG99kHoBSWvKWn5dXT5evV9y6h-hGaUK9aIVrHHaEY4kQ0lwpyh81K-EUIFleIpOmOEa0nadoZ-vAUc0z7hMdXQ49BBnGrwroYUcRg3Oe2h4LoG7LzfZecPOPV4k6ELvoa4wjW7WHwOm1-K3vmaMl6G2B2bJdRJHSKer-8-NZ9hiztXHXbRDYcSyjP0pHdDgef39QJ9ff_uy_y2WXz8cDe_WTReUlYbYYABoe0Sei6V6AG0apmhAMANgx4okdLAUnupBfetm56F9kZw5VqvvOYX6M3Jd7NbjtD5acnsBrvJYXT5YJML9u9ODGu7SnvLjNGtPhpcnQzW_8hubxY2xAJ5tIQJbpRSezrhr-7vy2m7g1LtGIqHYXAR0q5YLaRmppX8_ySfDjVETeTlifQ5lZKhf5iDEnvMgZ1yYE85mOCXfy78gP7-eP4TC1mx1A</recordid><startdate>20100601</startdate><enddate>20100601</enddate><creator>Boeva, Valentina</creator><creator>Surdez, Didier</creator><creator>Guillon, Noëlle</creator><creator>Tirode, Franck</creator><creator>Fejes, Anthony P</creator><creator>Delattre, Olivier</creator><creator>Barillot, Emmanuel</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4731-7817</orcidid><orcidid>https://orcid.org/0000-0002-4382-7185</orcidid><orcidid>https://orcid.org/0000-0002-7118-7859</orcidid></search><sort><creationdate>20100601</creationdate><title>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</title><author>Boeva, Valentina ; Surdez, Didier ; Guillon, Noëlle ; Tirode, Franck ; Fejes, Anthony P ; Delattre, Olivier ; Barillot, Emmanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c512t-48e2e019bef3564fee769281eee382efe10558eb7c5743c9a1091f8436a9c6c73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Binding Sites</topic><topic>Biochemistry, Molecular Biology</topic><topic>Cancer</topic><topic>Cell Line, Tumor</topic><topic>Chromatin Immunoprecipitation - methods</topic><topic>Consensus Sequence</topic><topic>Genomics</topic><topic>Humans</topic><topic>Life Sciences</topic><topic>Methods Online</topic><topic>Molecular biology</topic><topic>Oncogene Proteins, Fusion - metabolism</topic><topic>Proto-Oncogene Protein c-fli-1 - metabolism</topic><topic>Regulatory Elements, Transcriptional</topic><topic>RNA-Binding Protein EWS</topic><topic>Sequence Analysis, DNA</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Boeva, Valentina</creatorcontrib><creatorcontrib>Surdez, Didier</creatorcontrib><creatorcontrib>Guillon, Noëlle</creatorcontrib><creatorcontrib>Tirode, Franck</creatorcontrib><creatorcontrib>Fejes, Anthony P</creatorcontrib><creatorcontrib>Delattre, Olivier</creatorcontrib><creatorcontrib>Barillot, Emmanuel</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Boeva, Valentina</au><au>Surdez, Didier</au><au>Guillon, Noëlle</au><au>Tirode, Franck</au><au>Fejes, Anthony P</au><au>Delattre, Olivier</au><au>Barillot, Emmanuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2010-06-01</date><risdate>2010</risdate><volume>38</volume><issue>11</issue><spage>e126</spage><epage>e126</epage><pages>e126-e126</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered &gt;2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to approximately 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression--positively as often as negatively--and at much larger distances (up to approximately 1 Mb).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>20375099</pmid><doi>10.1093/nar/gkq217</doi><orcidid>https://orcid.org/0000-0003-4731-7817</orcidid><orcidid>https://orcid.org/0000-0002-4382-7185</orcidid><orcidid>https://orcid.org/0000-0002-7118-7859</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2010-06, Vol.38 (11), p.e126-e126
issn 0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2887977
source MEDLINE; DOAJ Directory of Open Access Journals; Access via Oxford University Press (Open Access Collection); PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Base Sequence
Binding Sites
Biochemistry, Molecular Biology
Cancer
Cell Line, Tumor
Chromatin Immunoprecipitation - methods
Consensus Sequence
Genomics
Humans
Life Sciences
Methods Online
Molecular biology
Oncogene Proteins, Fusion - metabolism
Proto-Oncogene Protein c-fli-1 - metabolism
Regulatory Elements, Transcriptional
RNA-Binding Protein EWS
Sequence Analysis, DNA
Transcription Factors - metabolism
title De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T05%3A03%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=De%20novo%20motif%20identification%20improves%20the%20accuracy%20of%20predicting%20transcription%20factor%20binding%20sites%20in%20ChIP-Seq%20data%20analysis&rft.jtitle=Nucleic%20acids%20research&rft.au=Boeva,%20Valentina&rft.date=2010-06-01&rft.volume=38&rft.issue=11&rft.spage=e126&rft.epage=e126&rft.pages=e126-e126&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkq217&rft_dat=%3Cproquest_pubme%3E733331806%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=733331806&rft_id=info:pmid/20375099&rfr_iscdi=true