iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength

As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insigh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of molecular sciences 2021-03, Vol.22 (7), p.3589
Hauptverfasser: Yang, Runtao, Wu, Feng, Zhang, Chengjin, Zhang, Lina
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 7
container_start_page 3589
container_title International journal of molecular sciences
container_volume 22
creator Yang, Runtao
Wu, Feng
Zhang, Chengjin
Zhang, Lina
description As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.
doi_str_mv 10.3390/ijms22073589
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8036415</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2508590266</sourcerecordid><originalsourceid>FETCH-LOGICAL-c412t-88072a8f705d76ed87813011296130d06d9956a69615bf38c99372d0b291dac93</originalsourceid><addsrcrecordid>eNpdkUtvEzEUhUcIREvLjjWyxKYLBvzI-MECKUrTUCkqi7ZiOfKM72QcZuxgO6n6X_pj69KHAqvrK3_33Ht0iuIDwV8YU_irXY-RUixYJdWr4pBMKC0x5uL13vugeBfjGmPKaKXeFgeMSSwZEYfFnZ27XrsWQrmYXnxDU3QKsEFL0MFZt0JnQY9w48NvZB2a-bGxTifrHbqxqUe_fDBoPjZgzAOsnUGX8GcLWQ8twEHI7A7Q1OwgRB2sHtAFJJQ8Ojfgku1u0fP6-Hf6qgcb0GUK4FapPy7edHqI8P6pHhXXZ_Or2Y9y-XNxPpsuy3ZCaCqlxIJq2QlcGcHBSCEJw4RQxXM1mBulKq55bqumY7JViglqcEMVMbpV7Kj4_qi72TYjmDafFvRQb4Iddbitvbb1vz_O9vXK72qJGZ-QKgucPAkEn-3HVI82tjAM2oHfxppWWFYKU84z-uk_dO23wWV7mZpIriqBSaY-P1Jt8DEG6F6OIbh-iL3ejz3jH_cNvMDPObN7tSKpPw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2548695701</pqid></control><display><type>article</type><title>iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Yang, Runtao ; Wu, Feng ; Zhang, Chengjin ; Zhang, Lina</creator><creatorcontrib>Yang, Runtao ; Wu, Feng ; Zhang, Chengjin ; Zhang, Lina</creatorcontrib><description>As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.</description><identifier>ISSN: 1422-0067</identifier><identifier>ISSN: 1661-6596</identifier><identifier>EISSN: 1422-0067</identifier><identifier>DOI: 10.3390/ijms22073589</identifier><identifier>PMID: 33808317</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Algorithms ; Cancer ; Critical components ; Datasets ; Deep Learning ; Deoxyribonucleic acid ; DNA ; DNA - genetics ; DNA methylation ; Enhancer Elements, Genetic - genetics ; Enhancers ; Gene expression ; Gene regulation ; Genetic engineering ; Genomes ; Human pathology ; Image Processing, Computer-Assisted - methods ; Linguistics ; Machine learning ; Methods ; Models, Theoretical ; Molecular modelling ; Neural networks ; Neural Networks, Computer ; Nucleotide sequence ; Regulatory Sequences, Nucleic Acid - genetics ; Segmentation ; Sequence Analysis, DNA - methods ; Support vector machines ; Therapeutic targets ; Transcription</subject><ispartof>International journal of molecular sciences, 2021-03, Vol.22 (7), p.3589</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 by the authors. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c412t-88072a8f705d76ed87813011296130d06d9956a69615bf38c99372d0b291dac93</citedby><cites>FETCH-LOGICAL-c412t-88072a8f705d76ed87813011296130d06d9956a69615bf38c99372d0b291dac93</cites><orcidid>0000-0002-0066-2114</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036415/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036415/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33808317$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yang, Runtao</creatorcontrib><creatorcontrib>Wu, Feng</creatorcontrib><creatorcontrib>Zhang, Chengjin</creatorcontrib><creatorcontrib>Zhang, Lina</creatorcontrib><title>iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength</title><title>International journal of molecular sciences</title><addtitle>Int J Mol Sci</addtitle><description>As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.</description><subject>Algorithms</subject><subject>Cancer</subject><subject>Critical components</subject><subject>Datasets</subject><subject>Deep Learning</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA - genetics</subject><subject>DNA methylation</subject><subject>Enhancer Elements, Genetic - genetics</subject><subject>Enhancers</subject><subject>Gene expression</subject><subject>Gene regulation</subject><subject>Genetic engineering</subject><subject>Genomes</subject><subject>Human pathology</subject><subject>Image Processing, Computer-Assisted - methods</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Models, Theoretical</subject><subject>Molecular modelling</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Nucleotide sequence</subject><subject>Regulatory Sequences, Nucleic Acid - genetics</subject><subject>Segmentation</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Support vector machines</subject><subject>Therapeutic targets</subject><subject>Transcription</subject><issn>1422-0067</issn><issn>1661-6596</issn><issn>1422-0067</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNpdkUtvEzEUhUcIREvLjjWyxKYLBvzI-MECKUrTUCkqi7ZiOfKM72QcZuxgO6n6X_pj69KHAqvrK3_33Ht0iuIDwV8YU_irXY-RUixYJdWr4pBMKC0x5uL13vugeBfjGmPKaKXeFgeMSSwZEYfFnZ27XrsWQrmYXnxDU3QKsEFL0MFZt0JnQY9w48NvZB2a-bGxTifrHbqxqUe_fDBoPjZgzAOsnUGX8GcLWQ8twEHI7A7Q1OwgRB2sHtAFJJQ8Ojfgku1u0fP6-Hf6qgcb0GUK4FapPy7edHqI8P6pHhXXZ_Or2Y9y-XNxPpsuy3ZCaCqlxIJq2QlcGcHBSCEJw4RQxXM1mBulKq55bqumY7JViglqcEMVMbpV7Kj4_qi72TYjmDafFvRQb4Iddbitvbb1vz_O9vXK72qJGZ-QKgucPAkEn-3HVI82tjAM2oHfxppWWFYKU84z-uk_dO23wWV7mZpIriqBSaY-P1Jt8DEG6F6OIbh-iL3ejz3jH_cNvMDPObN7tSKpPw</recordid><startdate>20210330</startdate><enddate>20210330</enddate><creator>Yang, Runtao</creator><creator>Wu, Feng</creator><creator>Zhang, Chengjin</creator><creator>Zhang, Lina</creator><general>MDPI AG</general><general>MDPI</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0066-2114</orcidid></search><sort><creationdate>20210330</creationdate><title>iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength</title><author>Yang, Runtao ; Wu, Feng ; Zhang, Chengjin ; Zhang, Lina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c412t-88072a8f705d76ed87813011296130d06d9956a69615bf38c99372d0b291dac93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cancer</topic><topic>Critical components</topic><topic>Datasets</topic><topic>Deep Learning</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA - genetics</topic><topic>DNA methylation</topic><topic>Enhancer Elements, Genetic - genetics</topic><topic>Enhancers</topic><topic>Gene expression</topic><topic>Gene regulation</topic><topic>Genetic engineering</topic><topic>Genomes</topic><topic>Human pathology</topic><topic>Image Processing, Computer-Assisted - methods</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Models, Theoretical</topic><topic>Molecular modelling</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Nucleotide sequence</topic><topic>Regulatory Sequences, Nucleic Acid - genetics</topic><topic>Segmentation</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Support vector machines</topic><topic>Therapeutic targets</topic><topic>Transcription</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Runtao</creatorcontrib><creatorcontrib>Wu, Feng</creatorcontrib><creatorcontrib>Zhang, Chengjin</creatorcontrib><creatorcontrib>Zhang, Lina</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>International journal of molecular sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Runtao</au><au>Wu, Feng</au><au>Zhang, Chengjin</au><au>Zhang, Lina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength</atitle><jtitle>International journal of molecular sciences</jtitle><addtitle>Int J Mol Sci</addtitle><date>2021-03-30</date><risdate>2021</risdate><volume>22</volume><issue>7</issue><spage>3589</spage><pages>3589-</pages><issn>1422-0067</issn><issn>1661-6596</issn><eissn>1422-0067</eissn><abstract>As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>33808317</pmid><doi>10.3390/ijms22073589</doi><orcidid>https://orcid.org/0000-0002-0066-2114</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1422-0067
ispartof International journal of molecular sciences, 2021-03, Vol.22 (7), p.3589
issn 1422-0067
1661-6596
1422-0067
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8036415
source MDPI - Multidisciplinary Digital Publishing Institute; MEDLINE; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Algorithms
Cancer
Critical components
Datasets
Deep Learning
Deoxyribonucleic acid
DNA
DNA - genetics
DNA methylation
Enhancer Elements, Genetic - genetics
Enhancers
Gene expression
Gene regulation
Genetic engineering
Genomes
Human pathology
Image Processing, Computer-Assisted - methods
Linguistics
Machine learning
Methods
Models, Theoretical
Molecular modelling
Neural networks
Neural Networks, Computer
Nucleotide sequence
Regulatory Sequences, Nucleic Acid - genetics
Segmentation
Sequence Analysis, DNA - methods
Support vector machines
Therapeutic targets
Transcription
title iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T02%3A45%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=iEnhancer-GAN:%20A%20Deep%20Learning%20Framework%20in%20Combination%20with%20Word%20Embedding%20and%20Sequence%20Generative%20Adversarial%20Net%20to%20Identify%20Enhancers%20and%20Their%20Strength&rft.jtitle=International%20journal%20of%20molecular%20sciences&rft.au=Yang,%20Runtao&rft.date=2021-03-30&rft.volume=22&rft.issue=7&rft.spage=3589&rft.pages=3589-&rft.issn=1422-0067&rft.eissn=1422-0067&rft_id=info:doi/10.3390/ijms22073589&rft_dat=%3Cproquest_pubme%3E2508590266%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2548695701&rft_id=info:pmid/33808317&rfr_iscdi=true