CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network
•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and imp...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2021-11, Vol.183, p.115404, Article 115404 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 115404 |
container_title | Expert systems with applications |
container_volume | 183 |
creator | Zhang, Yongqing Qiao, Shaojie Zeng, Yuanqi Gao, Dongrui Han, Nan Zhou, Jiliu |
description | •Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy.
Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model. |
doi_str_mv | 10.1016/j.eswa.2021.115404 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2579415050</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421008265</els_id><sourcerecordid>2579415050</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwApwscU5ZO3acIC5VxJ9UFQ5wthzHAYcSF9tpxduTEE4cOI20O7Oa_RA6J7AgQLLLdmHCXi0oULIghDNgB2hGcpEmmSjSQzSDgouEEcGO0UkILQARAGKG-nJ5k5Tr9RV-8qa2OtruFUevuqC93UbrOtwoHZ3Hle3qcRlsNHhv4xvWrtu5TT-a1AarPjrTaVcbj1VX_9l2pvc_EvfOv5-io0Ztgjn71Tl6ub15Lu-T1ePdQ7lcJZryPCaENBkVBIqmyHMBYhjWFUurStE8r5mhDRSmZlkKTHGlOIjh97yoGpFSIjhN5-hiurv17rM3IcrW9X7oEyTlomCEA4fBRSeX9i4Ebxq59fZD-S9JQI54ZStHvHLEKye8Q-h6Cpmh_84aL4O2w_8DRG90lLWz_8W_AcdMhCo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2579415050</pqid></control><display><type>article</type><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</creator><creatorcontrib>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</creatorcontrib><description>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy.
Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.115404</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Artificial neural networks ; Autoencoder ; Binding sites ; Bioinformatics ; Convolutional neural networks ; Deep learning ; Deoxyribonucleic acid ; DNA ; Feature extraction ; Image reconstruction ; Machine learning ; Motif discovery ; Neural networks ; Nucleotides ; Prediction models ; Training ; Transcription factor binding sites ; Transcription factors</subject><ispartof>Expert systems with applications, 2021-11, Vol.183, p.115404, Article 115404</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Nov 30, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</citedby><cites>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0957417421008265$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Zhang, Yongqing</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Zeng, Yuanqi</creatorcontrib><creatorcontrib>Gao, Dongrui</creatorcontrib><creatorcontrib>Han, Nan</creatorcontrib><creatorcontrib>Zhou, Jiliu</creatorcontrib><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><title>Expert systems with applications</title><description>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy.
Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Autoencoder</subject><subject>Binding sites</subject><subject>Bioinformatics</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Feature extraction</subject><subject>Image reconstruction</subject><subject>Machine learning</subject><subject>Motif discovery</subject><subject>Neural networks</subject><subject>Nucleotides</subject><subject>Prediction models</subject><subject>Training</subject><subject>Transcription factor binding sites</subject><subject>Transcription factors</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwApwscU5ZO3acIC5VxJ9UFQ5wthzHAYcSF9tpxduTEE4cOI20O7Oa_RA6J7AgQLLLdmHCXi0oULIghDNgB2hGcpEmmSjSQzSDgouEEcGO0UkILQARAGKG-nJ5k5Tr9RV-8qa2OtruFUevuqC93UbrOtwoHZ3Hle3qcRlsNHhv4xvWrtu5TT-a1AarPjrTaVcbj1VX_9l2pvc_EvfOv5-io0Ztgjn71Tl6ub15Lu-T1ePdQ7lcJZryPCaENBkVBIqmyHMBYhjWFUurStE8r5mhDRSmZlkKTHGlOIjh97yoGpFSIjhN5-hiurv17rM3IcrW9X7oEyTlomCEA4fBRSeX9i4Ebxq59fZD-S9JQI54ZStHvHLEKye8Q-h6Cpmh_84aL4O2w_8DRG90lLWz_8W_AcdMhCo</recordid><startdate>20211130</startdate><enddate>20211130</enddate><creator>Zhang, Yongqing</creator><creator>Qiao, Shaojie</creator><creator>Zeng, Yuanqi</creator><creator>Gao, Dongrui</creator><creator>Han, Nan</creator><creator>Zhou, Jiliu</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20211130</creationdate><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><author>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Autoencoder</topic><topic>Binding sites</topic><topic>Bioinformatics</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Feature extraction</topic><topic>Image reconstruction</topic><topic>Machine learning</topic><topic>Motif discovery</topic><topic>Neural networks</topic><topic>Nucleotides</topic><topic>Prediction models</topic><topic>Training</topic><topic>Transcription factor binding sites</topic><topic>Transcription factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yongqing</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Zeng, Yuanqi</creatorcontrib><creatorcontrib>Gao, Dongrui</creatorcontrib><creatorcontrib>Han, Nan</creatorcontrib><creatorcontrib>Zhou, Jiliu</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yongqing</au><au>Qiao, Shaojie</au><au>Zeng, Yuanqi</au><au>Gao, Dongrui</au><au>Han, Nan</au><au>Zhou, Jiliu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</atitle><jtitle>Expert systems with applications</jtitle><date>2021-11-30</date><risdate>2021</risdate><volume>183</volume><spage>115404</spage><pages>115404-</pages><artnum>115404</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy.
Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.115404</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2021-11, Vol.183, p.115404, Article 115404 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_journals_2579415050 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Algorithms Artificial neural networks Autoencoder Binding sites Bioinformatics Convolutional neural networks Deep learning Deoxyribonucleic acid DNA Feature extraction Image reconstruction Machine learning Motif discovery Neural networks Nucleotides Prediction models Training Transcription factor binding sites Transcription factors |
title | CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T04%3A06%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CAE-CNN:%20Predicting%20transcription%20factor%20binding%20site%20with%20convolutional%20autoencoder%20and%20convolutional%20neural%20network&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Zhang,%20Yongqing&rft.date=2021-11-30&rft.volume=183&rft.spage=115404&rft.pages=115404-&rft.artnum=115404&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.115404&rft_dat=%3Cproquest_cross%3E2579415050%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2579415050&rft_id=info:pmid/&rft_els_id=S0957417421008265&rfr_iscdi=true |