Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model

N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gai...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers in biology and medicine 2024-05, Vol.174, p.108330, Article 108330
Hauptverfasser:	Ke, Jinsong, Zhao, Jianmei, Li, Hongfei, Yuan, Lei, Dong, Guanghui, Wang, Guohua
Format:	Artikel
Sprache:	eng
Schlagworte:	Acetylation Algorithms Amino acids Artificial neural networks Attention Benchmarks BiLSTM CNN Databases, Protein Datasets Decision trees Deep Learning Enzymes Humans KEGG Localization Long short-term memory Machine learning N-terminal acetylation Neural networks Neural Networks, Computer Pathogenesis Post-translation Protein Processing, Post-Translational Proteins Proteins - chemistry Proteins - metabolism Redundancy Support vector machines Therapeutic targets Tripeptide word vectors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	108330
container_title	Computers in biology and medicine
container_volume	174
creator	Ke, Jinsong Zhao, Jianmei Li, Hongfei Yuan, Lei Dong, Guanghui Wang, Guohua
description	N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets. •Encoding protein sequences into tripeptide word vectors using the word2vec algorithm.•The attention mechanism improved the recognition of acetylation sites.•The acetylated proteins identified by the model are associated with diseases.
doi_str_mv	10.1016/j.compbiomed.2024.108330
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3035076015</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482524004141</els_id><sourcerecordid>3046570730</sourcerecordid><originalsourceid>FETCH-LOGICAL-c262t-71a3a50eaf479ca07b8158c693c41310fdce9ff52d78bf611371473e0ce1d0053</originalsourceid><addsrcrecordid>eNqFkdFr2zAQxsVoWdJs_8Iw9KUvzk6WbSmPa-i2QpYWmj0LWT6Bgm1lklLIfz95Thj0pU_idL_vju8-QjIKSwq0_rpfatcfGut6bJcFFGX6FozBBzKngq9yqFh5ReYAFPJSFNWM3ISwB4ASGHwkMyYqIWrK50Q_e2ytjtYNmTPZwbuIdsi2eUTf20F1mdIYT536R_SutcbqqQg2YsgaFbDNUrnebvN7u3nZ_cpVjDhcBNh9ItdGdQE_n98F-f39Ybf-mW-efjyuv21yXdRFzDlVTFWAypR8pRXwRtBK6HrFdEkZBdNqXBlTFS0XjakpZZyWnCFopC0kywtyN81NLv4cMUTZ26Cx69SA7hgkA1YBr4GO6O0bdO-OPtkdqbKuOPBEL4iYKO1dCB6NPHjbK3-SFOQYhNzL_0HIMQg5BZGkX84Ljs3Yuwgvl0_A_QRgusirRS-DtjjolIZHHWXr7Ptb_gKnpp3q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3046570730</pqid></control><display><type>article</type><title>Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Ke, Jinsong ; Zhao, Jianmei ; Li, Hongfei ; Yuan, Lei ; Dong, Guanghui ; Wang, Guohua</creator><creatorcontrib>Ke, Jinsong ; Zhao, Jianmei ; Li, Hongfei ; Yuan, Lei ; Dong, Guanghui ; Wang, Guohua</creatorcontrib><description>N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets. •Encoding protein sequences into tripeptide word vectors using the word2vec algorithm.•The attention mechanism improved the recognition of acetylation sites.•The acetylated proteins identified by the model are associated with diseases.</description><identifier>ISSN: 0010-4825</identifier><identifier>ISSN: 1879-0534</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2024.108330</identifier><identifier>PMID: 38588617</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Acetylation ; Algorithms ; Amino acids ; Artificial neural networks ; Attention ; Benchmarks ; BiLSTM ; CNN ; Databases, Protein ; Datasets ; Decision trees ; Deep Learning ; Enzymes ; Humans ; KEGG ; Localization ; Long short-term memory ; Machine learning ; N-terminal acetylation ; Neural networks ; Neural Networks, Computer ; Pathogenesis ; Post-translation ; Protein Processing, Post-Translational ; Proteins ; Proteins - chemistry ; Proteins - metabolism ; Redundancy ; Support vector machines ; Therapeutic targets ; Tripeptide word vectors</subject><ispartof>Computers in biology and medicine, 2024-05, Vol.174, p.108330, Article 108330</ispartof><rights>2024</rights><rights>Copyright © 2024. Published by Elsevier Ltd.</rights><rights>Copyright Elsevier Limited May 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c262t-71a3a50eaf479ca07b8158c693c41310fdce9ff52d78bf611371473e0ce1d0053</cites><orcidid>0009-0009-3399-9301</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0010482524004141$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38588617$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ke, Jinsong</creatorcontrib><creatorcontrib>Zhao, Jianmei</creatorcontrib><creatorcontrib>Li, Hongfei</creatorcontrib><creatorcontrib>Yuan, Lei</creatorcontrib><creatorcontrib>Dong, Guanghui</creatorcontrib><creatorcontrib>Wang, Guohua</creatorcontrib><title>Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets. •Encoding protein sequences into tripeptide word vectors using the word2vec algorithm.•The attention mechanism improved the recognition of acetylation sites.•The acetylated proteins identified by the model are associated with diseases.</description><subject>Acetylation</subject><subject>Algorithms</subject><subject>Amino acids</subject><subject>Artificial neural networks</subject><subject>Attention</subject><subject>Benchmarks</subject><subject>BiLSTM</subject><subject>CNN</subject><subject>Databases, Protein</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep Learning</subject><subject>Enzymes</subject><subject>Humans</subject><subject>KEGG</subject><subject>Localization</subject><subject>Long short-term memory</subject><subject>Machine learning</subject><subject>N-terminal acetylation</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Pathogenesis</subject><subject>Post-translation</subject><subject>Protein Processing, Post-Translational</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteins - metabolism</subject><subject>Redundancy</subject><subject>Support vector machines</subject><subject>Therapeutic targets</subject><subject>Tripeptide word vectors</subject><issn>0010-4825</issn><issn>1879-0534</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkdFr2zAQxsVoWdJs_8Iw9KUvzk6WbSmPa-i2QpYWmj0LWT6Bgm1lklLIfz95Thj0pU_idL_vju8-QjIKSwq0_rpfatcfGut6bJcFFGX6FozBBzKngq9yqFh5ReYAFPJSFNWM3ISwB4ASGHwkMyYqIWrK50Q_e2ytjtYNmTPZwbuIdsi2eUTf20F1mdIYT536R_SutcbqqQg2YsgaFbDNUrnebvN7u3nZ_cpVjDhcBNh9ItdGdQE_n98F-f39Ybf-mW-efjyuv21yXdRFzDlVTFWAypR8pRXwRtBK6HrFdEkZBdNqXBlTFS0XjakpZZyWnCFopC0kywtyN81NLv4cMUTZ26Cx69SA7hgkA1YBr4GO6O0bdO-OPtkdqbKuOPBEL4iYKO1dCB6NPHjbK3-SFOQYhNzL_0HIMQg5BZGkX84Ljs3Yuwgvl0_A_QRgusirRS-DtjjolIZHHWXr7Ptb_gKnpp3q</recordid><startdate>202405</startdate><enddate>202405</enddate><creator>Ke, Jinsong</creator><creator>Zhao, Jianmei</creator><creator>Li, Hongfei</creator><creator>Yuan, Lei</creator><creator>Dong, Guanghui</creator><creator>Wang, Guohua</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>M7Z</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0009-3399-9301</orcidid></search><sort><creationdate>202405</creationdate><title>Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model</title><author>Ke, Jinsong ; Zhao, Jianmei ; Li, Hongfei ; Yuan, Lei ; Dong, Guanghui ; Wang, Guohua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c262t-71a3a50eaf479ca07b8158c693c41310fdce9ff52d78bf611371473e0ce1d0053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acetylation</topic><topic>Algorithms</topic><topic>Amino acids</topic><topic>Artificial neural networks</topic><topic>Attention</topic><topic>Benchmarks</topic><topic>BiLSTM</topic><topic>CNN</topic><topic>Databases, Protein</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep Learning</topic><topic>Enzymes</topic><topic>Humans</topic><topic>KEGG</topic><topic>Localization</topic><topic>Long short-term memory</topic><topic>Machine learning</topic><topic>N-terminal acetylation</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Pathogenesis</topic><topic>Post-translation</topic><topic>Protein Processing, Post-Translational</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteins - metabolism</topic><topic>Redundancy</topic><topic>Support vector machines</topic><topic>Therapeutic targets</topic><topic>Tripeptide word vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ke, Jinsong</creatorcontrib><creatorcontrib>Zhao, Jianmei</creatorcontrib><creatorcontrib>Li, Hongfei</creatorcontrib><creatorcontrib>Yuan, Lei</creatorcontrib><creatorcontrib>Dong, Guanghui</creatorcontrib><creatorcontrib>Wang, Guohua</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Biochemistry Abstracts 1</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ke, Jinsong</au><au>Zhao, Jianmei</au><au>Li, Hongfei</au><au>Yuan, Lei</au><au>Dong, Guanghui</au><au>Wang, Guohua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2024-05</date><risdate>2024</risdate><volume>174</volume><spage>108330</spage><pages>108330-</pages><artnum>108330</artnum><issn>0010-4825</issn><issn>1879-0534</issn><eissn>1879-0534</eissn><abstract>N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets. •Encoding protein sequences into tripeptide word vectors using the word2vec algorithm.•The attention mechanism improved the recognition of acetylation sites.•The acetylated proteins identified by the model are associated with diseases.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>38588617</pmid><doi>10.1016/j.compbiomed.2024.108330</doi><orcidid>https://orcid.org/0009-0009-3399-9301</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0010-4825
ispartof	Computers in biology and medicine, 2024-05, Vol.174, p.108330, Article 108330
issn	0010-4825 1879-0534 1879-0534
language	eng
recordid	cdi_proquest_miscellaneous_3035076015
source	MEDLINE; Elsevier ScienceDirect Journals
subjects	Acetylation Algorithms Amino acids Artificial neural networks Attention Benchmarks BiLSTM CNN Databases, Protein Datasets Decision trees Deep Learning Enzymes Humans KEGG Localization Long short-term memory Machine learning N-terminal acetylation Neural networks Neural Networks, Computer Pathogenesis Post-translation Protein Processing, Post-Translational Proteins Proteins - chemistry Proteins - metabolism Redundancy Support vector machines Therapeutic targets Tripeptide word vectors
title	Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A38%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20protein%20N-terminal%20acetylation%20modification%20sites%20based%20on%20CNN-BiLSTM-attention%20model&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Ke,%20Jinsong&rft.date=2024-05&rft.volume=174&rft.spage=108330&rft.pages=108330-&rft.artnum=108330&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2024.108330&rft_dat=%3Cproquest_cross%3E3046570730%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3046570730&rft_id=info:pmid/38588617&rft_els_id=S0010482524004141&rfr_iscdi=true