promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences

Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on computational biology and bioinformatics 2024-01, Vol.21 (1), p.208-214
Hauptverfasser:	Nagda, Bindi M., Nguyen, Van Minh, White, Ryan T.
Format:	Artikel
Sprache:	eng
Schlagworte:	Base Sequence Bioinformatics Biological system modeling Correlation coefficient Correlation coefficients Data mining Deoxyribonucleic acid DNA DNA - genetics DNA sequences Ensemble learning Feature extraction Gene sequencing Genomics Machine Learning Neural networks Nucleotide sequence Pattern analysis Promoter regions Promoter Regions, Genetic - genetics Recurrent neural networks Regulatory sequences TATA Box Transcription factors Transcription initiation Transcription, Genetic
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	214
container_issue	1
container_start_page	208
container_title	IEEE/ACM transactions on computational biology and bioinformatics
container_volume	21
creator	Nagda, Bindi M. Nguyen, Van Minh White, Ryan T.
description	Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.
doi_str_mv	10.1109/TCBB.2023.3339597
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2923118964</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10342761</ieee_id><sourcerecordid>2923118964</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</originalsourceid><addsrcrecordid>eNpdkM1LwzAchoMobn78AYJIwIuXzny38ebmdMKmA_XiJWTpr9Kxtpp0B_97UzdFPCX58bxvkgehE0oGlBJ9-TwaDgeMMD7gnGup0x3Up1KmidZK7HZ7IROpFe-hgxCWhDChidhHPZ4RSRVVffT67pvqaTwbTsdXeGJ9jue2bcHXeFbWZf2GbZ3jcR2gWqwAT8H672nReHwDLbi2O908XON57GliED_BxxpqB-EI7RV2FeB4ux6il9vx82iSTB_v7kfX08RxwtpEsIVNc2e5slwIW8RhKgqnFpblDnJHoRBW5iJLM6sUoU4zaomTmabEuVzwQ3Sx6Y1fiVeH1lRlcLBa2RqadTAs05mWnCkS0fN_6LJZ-zq-zjDNOKVZFBcpuqGcb0LwUJh3X1bWfxpKTCfedOJNJ95sxcfM2bZ5vagg_038mI7A6QYoAeBPIRcsVZR_AcUxhaw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923118964</pqid></control><display><type>article</type><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><source>IEEE Electronic Library (IEL)</source><creator>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</creator><creatorcontrib>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</creatorcontrib><description>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2023.3339597</identifier><identifier>PMID: 38051616</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Base Sequence ; Bioinformatics ; Biological system modeling ; Correlation coefficient ; Correlation coefficients ; Data mining ; Deoxyribonucleic acid ; DNA ; DNA - genetics ; DNA sequences ; Ensemble learning ; Feature extraction ; Gene sequencing ; Genomics ; Machine Learning ; Neural networks ; Nucleotide sequence ; Pattern analysis ; Promoter regions ; Promoter Regions, Genetic - genetics ; Recurrent neural networks ; Regulatory sequences ; TATA Box ; Transcription factors ; Transcription initiation ; Transcription, Genetic</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2024-01, Vol.21 (1), p.208-214</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</cites><orcidid>0000-0003-3507-5494 ; 0000-0002-2479-2503 ; 0000-0002-5524-629X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10342761$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10342761$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38051616$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nagda, Bindi M.</creatorcontrib><creatorcontrib>Nguyen, Van Minh</creatorcontrib><creatorcontrib>White, Ryan T.</creatorcontrib><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</description><subject>Base Sequence</subject><subject>Bioinformatics</subject><subject>Biological system modeling</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Data mining</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA - genetics</subject><subject>DNA sequences</subject><subject>Ensemble learning</subject><subject>Feature extraction</subject><subject>Gene sequencing</subject><subject>Genomics</subject><subject>Machine Learning</subject><subject>Neural networks</subject><subject>Nucleotide sequence</subject><subject>Pattern analysis</subject><subject>Promoter regions</subject><subject>Promoter Regions, Genetic - genetics</subject><subject>Recurrent neural networks</subject><subject>Regulatory sequences</subject><subject>TATA Box</subject><subject>Transcription factors</subject><subject>Transcription initiation</subject><subject>Transcription, Genetic</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkM1LwzAchoMobn78AYJIwIuXzny38ebmdMKmA_XiJWTpr9Kxtpp0B_97UzdFPCX58bxvkgehE0oGlBJ9-TwaDgeMMD7gnGup0x3Up1KmidZK7HZ7IROpFe-hgxCWhDChidhHPZ4RSRVVffT67pvqaTwbTsdXeGJ9jue2bcHXeFbWZf2GbZ3jcR2gWqwAT8H672nReHwDLbi2O908XON57GliED_BxxpqB-EI7RV2FeB4ux6il9vx82iSTB_v7kfX08RxwtpEsIVNc2e5slwIW8RhKgqnFpblDnJHoRBW5iJLM6sUoU4zaomTmabEuVzwQ3Sx6Y1fiVeH1lRlcLBa2RqadTAs05mWnCkS0fN_6LJZ-zq-zjDNOKVZFBcpuqGcb0LwUJh3X1bWfxpKTCfedOJNJ95sxcfM2bZ5vagg_038mI7A6QYoAeBPIRcsVZR_AcUxhaw</recordid><startdate>202401</startdate><enddate>202401</enddate><creator>Nagda, Bindi M.</creator><creator>Nguyen, Van Minh</creator><creator>White, Ryan T.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-3507-5494</orcidid><orcidid>https://orcid.org/0000-0002-2479-2503</orcidid><orcidid>https://orcid.org/0000-0002-5524-629X</orcidid></search><sort><creationdate>202401</creationdate><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><author>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Base Sequence</topic><topic>Bioinformatics</topic><topic>Biological system modeling</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Data mining</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA - genetics</topic><topic>DNA sequences</topic><topic>Ensemble learning</topic><topic>Feature extraction</topic><topic>Gene sequencing</topic><topic>Genomics</topic><topic>Machine Learning</topic><topic>Neural networks</topic><topic>Nucleotide sequence</topic><topic>Pattern analysis</topic><topic>Promoter regions</topic><topic>Promoter Regions, Genetic - genetics</topic><topic>Recurrent neural networks</topic><topic>Regulatory sequences</topic><topic>TATA Box</topic><topic>Transcription factors</topic><topic>Transcription initiation</topic><topic>Transcription, Genetic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nagda, Bindi M.</creatorcontrib><creatorcontrib>Nguyen, Van Minh</creatorcontrib><creatorcontrib>White, Ryan T.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nagda, Bindi M.</au><au>Nguyen, Van Minh</au><au>White, Ryan T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2024-01</date><risdate>2024</risdate><volume>21</volume><issue>1</issue><spage>208</spage><epage>214</epage><pages>208-214</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38051616</pmid><doi>10.1109/TCBB.2023.3339597</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-3507-5494</orcidid><orcidid>https://orcid.org/0000-0002-2479-2503</orcidid><orcidid>https://orcid.org/0000-0002-5524-629X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1545-5963
ispartof	IEEE/ACM transactions on computational biology and bioinformatics, 2024-01, Vol.21 (1), p.208-214
issn	1545-5963 1557-9964
language	eng
recordid	cdi_proquest_journals_2923118964
source	IEEE Electronic Library (IEL)
subjects	Base Sequence Bioinformatics Biological system modeling Correlation coefficient Correlation coefficients Data mining Deoxyribonucleic acid DNA DNA - genetics DNA sequences Ensemble learning Feature extraction Gene sequencing Genomics Machine Learning Neural networks Nucleotide sequence Pattern analysis Promoter regions Promoter Regions, Genetic - genetics Recurrent neural networks Regulatory sequences TATA Box Transcription factors Transcription initiation Transcription, Genetic
title	promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T14%3A43%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=promSEMBLE:%20Hard%20Pattern%20Mining%20and%20Ensemble%20Learning%20for%20Detecting%20DNA%20Promoter%20Sequences&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Nagda,%20Bindi%20M.&rft.date=2024-01&rft.volume=21&rft.issue=1&rft.spage=208&rft.epage=214&rft.pages=208-214&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2023.3339597&rft_dat=%3Cproquest_RIE%3E2923118964%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923118964&rft_id=info:pmid/38051616&rft_ieee_id=10342761&rfr_iscdi=true