promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences
Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of p...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on computational biology and bioinformatics 2024-01, Vol.21 (1), p.208-214 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 214 |
---|---|
container_issue | 1 |
container_start_page | 208 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 21 |
creator | Nagda, Bindi M. Nguyen, Van Minh White, Ryan T. |
description | Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences. |
doi_str_mv | 10.1109/TCBB.2023.3339597 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2923118964</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10342761</ieee_id><sourcerecordid>2923118964</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</originalsourceid><addsrcrecordid>eNpdkM1LwzAchoMobn78AYJIwIuXzny38ebmdMKmA_XiJWTpr9Kxtpp0B_97UzdFPCX58bxvkgehE0oGlBJ9-TwaDgeMMD7gnGup0x3Up1KmidZK7HZ7IROpFe-hgxCWhDChidhHPZ4RSRVVffT67pvqaTwbTsdXeGJ9jue2bcHXeFbWZf2GbZ3jcR2gWqwAT8H672nReHwDLbi2O908XON57GliED_BxxpqB-EI7RV2FeB4ux6il9vx82iSTB_v7kfX08RxwtpEsIVNc2e5slwIW8RhKgqnFpblDnJHoRBW5iJLM6sUoU4zaomTmabEuVzwQ3Sx6Y1fiVeH1lRlcLBa2RqadTAs05mWnCkS0fN_6LJZ-zq-zjDNOKVZFBcpuqGcb0LwUJh3X1bWfxpKTCfedOJNJ95sxcfM2bZ5vagg_038mI7A6QYoAeBPIRcsVZR_AcUxhaw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923118964</pqid></control><display><type>article</type><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><source>IEEE Electronic Library (IEL)</source><creator>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</creator><creatorcontrib>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</creatorcontrib><description>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2023.3339597</identifier><identifier>PMID: 38051616</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Base Sequence ; Bioinformatics ; Biological system modeling ; Correlation coefficient ; Correlation coefficients ; Data mining ; Deoxyribonucleic acid ; DNA ; DNA - genetics ; DNA sequences ; Ensemble learning ; Feature extraction ; Gene sequencing ; Genomics ; Machine Learning ; Neural networks ; Nucleotide sequence ; Pattern analysis ; Promoter regions ; Promoter Regions, Genetic - genetics ; Recurrent neural networks ; Regulatory sequences ; TATA Box ; Transcription factors ; Transcription initiation ; Transcription, Genetic</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2024-01, Vol.21 (1), p.208-214</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</cites><orcidid>0000-0003-3507-5494 ; 0000-0002-2479-2503 ; 0000-0002-5524-629X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10342761$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10342761$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38051616$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nagda, Bindi M.</creatorcontrib><creatorcontrib>Nguyen, Van Minh</creatorcontrib><creatorcontrib>White, Ryan T.</creatorcontrib><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</description><subject>Base Sequence</subject><subject>Bioinformatics</subject><subject>Biological system modeling</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Data mining</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA - genetics</subject><subject>DNA sequences</subject><subject>Ensemble learning</subject><subject>Feature extraction</subject><subject>Gene sequencing</subject><subject>Genomics</subject><subject>Machine Learning</subject><subject>Neural networks</subject><subject>Nucleotide sequence</subject><subject>Pattern analysis</subject><subject>Promoter regions</subject><subject>Promoter Regions, Genetic - genetics</subject><subject>Recurrent neural networks</subject><subject>Regulatory sequences</subject><subject>TATA Box</subject><subject>Transcription factors</subject><subject>Transcription initiation</subject><subject>Transcription, Genetic</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkM1LwzAchoMobn78AYJIwIuXzny38ebmdMKmA_XiJWTpr9Kxtpp0B_97UzdFPCX58bxvkgehE0oGlBJ9-TwaDgeMMD7gnGup0x3Up1KmidZK7HZ7IROpFe-hgxCWhDChidhHPZ4RSRVVffT67pvqaTwbTsdXeGJ9jue2bcHXeFbWZf2GbZ3jcR2gWqwAT8H672nReHwDLbi2O908XON57GliED_BxxpqB-EI7RV2FeB4ux6il9vx82iSTB_v7kfX08RxwtpEsIVNc2e5slwIW8RhKgqnFpblDnJHoRBW5iJLM6sUoU4zaomTmabEuVzwQ3Sx6Y1fiVeH1lRlcLBa2RqadTAs05mWnCkS0fN_6LJZ-zq-zjDNOKVZFBcpuqGcb0LwUJh3X1bWfxpKTCfedOJNJ95sxcfM2bZ5vagg_038mI7A6QYoAeBPIRcsVZR_AcUxhaw</recordid><startdate>202401</startdate><enddate>202401</enddate><creator>Nagda, Bindi M.</creator><creator>Nguyen, Van Minh</creator><creator>White, Ryan T.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-3507-5494</orcidid><orcidid>https://orcid.org/0000-0002-2479-2503</orcidid><orcidid>https://orcid.org/0000-0002-5524-629X</orcidid></search><sort><creationdate>202401</creationdate><title>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</title><author>Nagda, Bindi M. ; Nguyen, Van Minh ; White, Ryan T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-42ba7dca36a344af30274fc6ba2dcedc1ef4a5d4878a6601c921a0c58910ccd43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Base Sequence</topic><topic>Bioinformatics</topic><topic>Biological system modeling</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Data mining</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA - genetics</topic><topic>DNA sequences</topic><topic>Ensemble learning</topic><topic>Feature extraction</topic><topic>Gene sequencing</topic><topic>Genomics</topic><topic>Machine Learning</topic><topic>Neural networks</topic><topic>Nucleotide sequence</topic><topic>Pattern analysis</topic><topic>Promoter regions</topic><topic>Promoter Regions, Genetic - genetics</topic><topic>Recurrent neural networks</topic><topic>Regulatory sequences</topic><topic>TATA Box</topic><topic>Transcription factors</topic><topic>Transcription initiation</topic><topic>Transcription, Genetic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nagda, Bindi M.</creatorcontrib><creatorcontrib>Nguyen, Van Minh</creatorcontrib><creatorcontrib>White, Ryan T.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nagda, Bindi M.</au><au>Nguyen, Van Minh</au><au>White, Ryan T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2024-01</date><risdate>2024</risdate><volume>21</volume><issue>1</issue><spage>208</spage><epage>214</epage><pages>208-214</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications, and for identification of rarely expressed genes. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of four different species. Using extensive independent tests and previously published results, we demonstrate that our method sets a new state-of-the-art of over 98% Matthews correlation coefficient in all eight organism categories for recognizing the stretch of base pairs that code for the promoter region within DNA sequences.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38051616</pmid><doi>10.1109/TCBB.2023.3339597</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-3507-5494</orcidid><orcidid>https://orcid.org/0000-0002-2479-2503</orcidid><orcidid>https://orcid.org/0000-0002-5524-629X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2024-01, Vol.21 (1), p.208-214 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_proquest_journals_2923118964 |
source | IEEE Electronic Library (IEL) |
subjects | Base Sequence Bioinformatics Biological system modeling Correlation coefficient Correlation coefficients Data mining Deoxyribonucleic acid DNA DNA - genetics DNA sequences Ensemble learning Feature extraction Gene sequencing Genomics Machine Learning Neural networks Nucleotide sequence Pattern analysis Promoter regions Promoter Regions, Genetic - genetics Recurrent neural networks Regulatory sequences TATA Box Transcription factors Transcription initiation Transcription, Genetic |
title | promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T14%3A43%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=promSEMBLE:%20Hard%20Pattern%20Mining%20and%20Ensemble%20Learning%20for%20Detecting%20DNA%20Promoter%20Sequences&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Nagda,%20Bindi%20M.&rft.date=2024-01&rft.volume=21&rft.issue=1&rft.spage=208&rft.epage=214&rft.pages=208-214&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2023.3339597&rft_dat=%3Cproquest_RIE%3E2923118964%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923118964&rft_id=info:pmid/38051616&rft_ieee_id=10342761&rfr_iscdi=true |