CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network

•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and imp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2021-11, Vol.183, p.115404, Article 115404
Hauptverfasser:	Zhang, Yongqing, Qiao, Shaojie, Zeng, Yuanqi, Gao, Dongrui, Han, Nan, Zhou, Jiliu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Autoencoder Binding sites Bioinformatics Convolutional neural networks Deep learning Deoxyribonucleic acid DNA Feature extraction Image reconstruction Machine learning Motif discovery Neural networks Nucleotides Prediction models Training Transcription factor binding sites Transcription factors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	115404
container_title	Expert systems with applications
container_volume	183
creator	Zhang, Yongqing Qiao, Shaojie Zeng, Yuanqi Gao, Dongrui Han, Nan Zhou, Jiliu
description	•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy. Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.
doi_str_mv	10.1016/j.eswa.2021.115404
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2579415050</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421008265</els_id><sourcerecordid>2579415050</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwApwscU5ZO3acIC5VxJ9UFQ5wthzHAYcSF9tpxduTEE4cOI20O7Oa_RA6J7AgQLLLdmHCXi0oULIghDNgB2hGcpEmmSjSQzSDgouEEcGO0UkILQARAGKG-nJ5k5Tr9RV-8qa2OtruFUevuqC93UbrOtwoHZ3Hle3qcRlsNHhv4xvWrtu5TT-a1AarPjrTaVcbj1VX_9l2pvc_EvfOv5-io0Ztgjn71Tl6ub15Lu-T1ePdQ7lcJZryPCaENBkVBIqmyHMBYhjWFUurStE8r5mhDRSmZlkKTHGlOIjh97yoGpFSIjhN5-hiurv17rM3IcrW9X7oEyTlomCEA4fBRSeX9i4Ebxq59fZD-S9JQI54ZStHvHLEKye8Q-h6Cpmh_84aL4O2w_8DRG90lLWz_8W_AcdMhCo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2579415050</pqid></control><display><type>article</type><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</creator><creatorcontrib>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</creatorcontrib><description>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy. Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.115404</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Artificial neural networks ; Autoencoder ; Binding sites ; Bioinformatics ; Convolutional neural networks ; Deep learning ; Deoxyribonucleic acid ; DNA ; Feature extraction ; Image reconstruction ; Machine learning ; Motif discovery ; Neural networks ; Nucleotides ; Prediction models ; Training ; Transcription factor binding sites ; Transcription factors</subject><ispartof>Expert systems with applications, 2021-11, Vol.183, p.115404, Article 115404</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Nov 30, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</citedby><cites>FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0957417421008265$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Zhang, Yongqing</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Zeng, Yuanqi</creatorcontrib><creatorcontrib>Gao, Dongrui</creatorcontrib><creatorcontrib>Han, Nan</creatorcontrib><creatorcontrib>Zhou, Jiliu</creatorcontrib><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><title>Expert systems with applications</title><description>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy. Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Autoencoder</subject><subject>Binding sites</subject><subject>Bioinformatics</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Feature extraction</subject><subject>Image reconstruction</subject><subject>Machine learning</subject><subject>Motif discovery</subject><subject>Neural networks</subject><subject>Nucleotides</subject><subject>Prediction models</subject><subject>Training</subject><subject>Transcription factor binding sites</subject><subject>Transcription factors</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwApwscU5ZO3acIC5VxJ9UFQ5wthzHAYcSF9tpxduTEE4cOI20O7Oa_RA6J7AgQLLLdmHCXi0oULIghDNgB2hGcpEmmSjSQzSDgouEEcGO0UkILQARAGKG-nJ5k5Tr9RV-8qa2OtruFUevuqC93UbrOtwoHZ3Hle3qcRlsNHhv4xvWrtu5TT-a1AarPjrTaVcbj1VX_9l2pvc_EvfOv5-io0Ztgjn71Tl6ub15Lu-T1ePdQ7lcJZryPCaENBkVBIqmyHMBYhjWFUurStE8r5mhDRSmZlkKTHGlOIjh97yoGpFSIjhN5-hiurv17rM3IcrW9X7oEyTlomCEA4fBRSeX9i4Ebxq59fZD-S9JQI54ZStHvHLEKye8Q-h6Cpmh_84aL4O2w_8DRG90lLWz_8W_AcdMhCo</recordid><startdate>20211130</startdate><enddate>20211130</enddate><creator>Zhang, Yongqing</creator><creator>Qiao, Shaojie</creator><creator>Zeng, Yuanqi</creator><creator>Gao, Dongrui</creator><creator>Han, Nan</creator><creator>Zhou, Jiliu</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20211130</creationdate><title>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</title><author>Zhang, Yongqing ; Qiao, Shaojie ; Zeng, Yuanqi ; Gao, Dongrui ; Han, Nan ; Zhou, Jiliu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-11f627109f988707258db43bba288d4e2f09ed46304a5aa50711589bf73217523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Autoencoder</topic><topic>Binding sites</topic><topic>Bioinformatics</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Feature extraction</topic><topic>Image reconstruction</topic><topic>Machine learning</topic><topic>Motif discovery</topic><topic>Neural networks</topic><topic>Nucleotides</topic><topic>Prediction models</topic><topic>Training</topic><topic>Transcription factor binding sites</topic><topic>Transcription factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yongqing</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Zeng, Yuanqi</creatorcontrib><creatorcontrib>Gao, Dongrui</creatorcontrib><creatorcontrib>Han, Nan</creatorcontrib><creatorcontrib>Zhou, Jiliu</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yongqing</au><au>Qiao, Shaojie</au><au>Zeng, Yuanqi</au><au>Gao, Dongrui</au><au>Han, Nan</au><au>Zhou, Jiliu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network</atitle><jtitle>Expert systems with applications</jtitle><date>2021-11-30</date><risdate>2021</risdate><volume>183</volume><spage>115404</spage><pages>115404-</pages><artnum>115404</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•Integrate the unsupervised and supervised method to predict the TF binding site.•CAE-CNN share the parameters, so the training time can be significantly reduced.•Effectively gradient-based training the model by the gating units.•Only use positive samples for pre-training to reduce the noise and improve accuracy. Transcription factor binding site (TFBS) is a DNA sequence that binds to transcription factor and regulates the transcription process of the gene. Although deep learning algorithms are superior to traditional methods in predicting transcription factor binding site, they often rely too much on negative sample data, which cannot be verified by experiment. In particular, a training model with such negative samples can generate a lot of noisy data and affect the classification performance. In order to cope with the aforementioned drawbacks, we propose a new architecture by combining a convolutional autoencoder with convolutional neural network, which is called CAE-CNN (Convolutional AutoEncoder and Convolutional Neural Network). Specifically, motivated by the image reconstruction, we use a convolutional autoencoder to extract useful features from the positive samples in DNA nucleotides. Consequently, the learned features will be used by the convolutional neural network in the phase of training. Furthermore, we employ a highway connection layer to better capture the features of DNA nucleotides through a gated unit. Extensive experiments based on human and mouse TFBS datasets evaluate the effectiveness of the proposed method for the motif discovery task, outperforming the state-of-the-art methods in accuracy, precision, recall, and AUC value. To the best of our knowledge, the original contribution of this work lies in integrating unsupervised and supervised learning methods to study the TFBS, thereby being able to build a more robust and generative TFBS prediction model.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.115404</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2021-11, Vol.183, p.115404, Article 115404
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_2579415050
source	Elsevier ScienceDirect Journals Complete
subjects	Algorithms Artificial neural networks Autoencoder Binding sites Bioinformatics Convolutional neural networks Deep learning Deoxyribonucleic acid DNA Feature extraction Image reconstruction Machine learning Motif discovery Neural networks Nucleotides Prediction models Training Transcription factor binding sites Transcription factors
title	CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T04%3A06%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CAE-CNN:%20Predicting%20transcription%20factor%20binding%20site%20with%20convolutional%20autoencoder%20and%20convolutional%20neural%20network&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Zhang,%20Yongqing&rft.date=2021-11-30&rft.volume=183&rft.spage=115404&rft.pages=115404-&rft.artnum=115404&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.115404&rft_dat=%3Cproquest_cross%3E2579415050%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2579415050&rft_id=info:pmid/&rft_els_id=S0957417421008265&rfr_iscdi=true