ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique

Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational biology and chemistry 2024-12, Vol.113, p.108212, Article 108212
Hauptverfasser: Zuo, Yun, Wan, Minquan, Shen, Yang, Wang, Xinheng, He, Wenying, Bi, Yue, Liu, Xiangrong, Deng, Zhaohong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 108212
container_title Computational biology and chemistry
container_volume 113
creator Zuo, Yun
Wan, Minquan
Shen, Yang
Wang, Xinheng
He, Wenying
Bi, Yue
Liu, Xiangrong
Deng, Zhaohong
description Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite. [Display omitted] •Help researchers better understand lysine crotonylation sites and associated mechanisms.•Using computational methods to predict related sites is less costly and faster.•Introduction to
doi_str_mv 10.1016/j.compbiolchem.2024.108212
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3105491112</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1476927124002007</els_id><sourcerecordid>3105491112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c253t-41fc7dc97383cdea11ec2585aa3283419357529403fea3e747abb057e26222b3</originalsourceid><addsrcrecordid>eNqNkF1PwjAUhhujEfz4C2bxypthPza6cWdQkAQhIVzoVdN1Z1KytbhuJvx7S4bES6_anvOcvjkPQvcEDwgmw8ftQNlql2lbqg1UA4pp5BsJJfQM9UnEh2FKk_fz052THrpybosxZRjHl6jHfJGncdpH2Wz-MV4tnW5gFMxyMI0utJKNtiawRVDunTYQqNo21uzLrn6AXZBJB3ngn5PxWzhdLRZBa3Konax2pTafQQNqY_RXCzfoopClg9vjeY3Wk5f1-DWcL6ez8dM8VDRmTRiRQvFcpZwlTOUgCQHfSGIpGU1YRFIW85imEWYFSAY84jLLcMyBDimlGbtGD923u9r6VNeISjsFZSkN2NYJRnAcpYQQ6tFRh_q9nKuhELtaV7LeC4LFQbHYir-KxUGx6BT74btjTptVkJ9Gf5164LkDwC_7raEWTmkwCnJdg2pEbvV_cn4A9yCT8Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3105491112</pqid></control><display><type>article</type><title>ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique</title><source>MEDLINE</source><source>ScienceDirect Journals (5 years ago - present)</source><creator>Zuo, Yun ; Wan, Minquan ; Shen, Yang ; Wang, Xinheng ; He, Wenying ; Bi, Yue ; Liu, Xiangrong ; Deng, Zhaohong</creator><creatorcontrib>Zuo, Yun ; Wan, Minquan ; Shen, Yang ; Wang, Xinheng ; He, Wenying ; Bi, Yue ; Liu, Xiangrong ; Deng, Zhaohong</creatorcontrib><description>Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite. [Display omitted] •Help researchers better understand lysine crotonylation sites and associated mechanisms.•Using computational methods to predict related sites is less costly and faster.•Introduction to applications and innovations in lysine crotonylation site recognition in recent years.•Compare differences of models in the design of algorithms and feature extracted.•Discussion of shortcomings and future prospects for this topic at the intersection of biology and computing.</description><identifier>ISSN: 1476-9271</identifier><identifier>ISSN: 1476-928X</identifier><identifier>EISSN: 1476-928X</identifier><identifier>DOI: 10.1016/j.compbiolchem.2024.108212</identifier><identifier>PMID: 39277959</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Algorithms ; Fully connected neural network ; Imbalance data processing ; Lysine - chemistry ; Lysine - metabolism ; Neural Networks, Computer ; Protein lysine crotonylation ; Protein Processing, Post-Translational ; Proteins - chemistry ; Proteins - metabolism ; Sequence analysis</subject><ispartof>Computational biology and chemistry, 2024-12, Vol.113, p.108212, Article 108212</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c253t-41fc7dc97383cdea11ec2585aa3283419357529403fea3e747abb057e26222b3</cites><orcidid>0009-0009-3877-8102 ; 0009-0009-2452-7580</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.compbiolchem.2024.108212$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3548,27922,27923,45993</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39277959$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zuo, Yun</creatorcontrib><creatorcontrib>Wan, Minquan</creatorcontrib><creatorcontrib>Shen, Yang</creatorcontrib><creatorcontrib>Wang, Xinheng</creatorcontrib><creatorcontrib>He, Wenying</creatorcontrib><creatorcontrib>Bi, Yue</creatorcontrib><creatorcontrib>Liu, Xiangrong</creatorcontrib><creatorcontrib>Deng, Zhaohong</creatorcontrib><title>ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique</title><title>Computational biology and chemistry</title><addtitle>Comput Biol Chem</addtitle><description>Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite. [Display omitted] •Help researchers better understand lysine crotonylation sites and associated mechanisms.•Using computational methods to predict related sites is less costly and faster.•Introduction to applications and innovations in lysine crotonylation site recognition in recent years.•Compare differences of models in the design of algorithms and feature extracted.•Discussion of shortcomings and future prospects for this topic at the intersection of biology and computing.</description><subject>Algorithms</subject><subject>Fully connected neural network</subject><subject>Imbalance data processing</subject><subject>Lysine - chemistry</subject><subject>Lysine - metabolism</subject><subject>Neural Networks, Computer</subject><subject>Protein lysine crotonylation</subject><subject>Protein Processing, Post-Translational</subject><subject>Proteins - chemistry</subject><subject>Proteins - metabolism</subject><subject>Sequence analysis</subject><issn>1476-9271</issn><issn>1476-928X</issn><issn>1476-928X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkF1PwjAUhhujEfz4C2bxypthPza6cWdQkAQhIVzoVdN1Z1KytbhuJvx7S4bES6_anvOcvjkPQvcEDwgmw8ftQNlql2lbqg1UA4pp5BsJJfQM9UnEh2FKk_fz052THrpybosxZRjHl6jHfJGncdpH2Wz-MV4tnW5gFMxyMI0utJKNtiawRVDunTYQqNo21uzLrn6AXZBJB3ngn5PxWzhdLRZBa3Konax2pTafQQNqY_RXCzfoopClg9vjeY3Wk5f1-DWcL6ez8dM8VDRmTRiRQvFcpZwlTOUgCQHfSGIpGU1YRFIW85imEWYFSAY84jLLcMyBDimlGbtGD923u9r6VNeISjsFZSkN2NYJRnAcpYQQ6tFRh_q9nKuhELtaV7LeC4LFQbHYir-KxUGx6BT74btjTptVkJ9Gf5164LkDwC_7raEWTmkwCnJdg2pEbvV_cn4A9yCT8Q</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Zuo, Yun</creator><creator>Wan, Minquan</creator><creator>Shen, Yang</creator><creator>Wang, Xinheng</creator><creator>He, Wenying</creator><creator>Bi, Yue</creator><creator>Liu, Xiangrong</creator><creator>Deng, Zhaohong</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0009-3877-8102</orcidid><orcidid>https://orcid.org/0009-0009-2452-7580</orcidid></search><sort><creationdate>202412</creationdate><title>ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique</title><author>Zuo, Yun ; Wan, Minquan ; Shen, Yang ; Wang, Xinheng ; He, Wenying ; Bi, Yue ; Liu, Xiangrong ; Deng, Zhaohong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c253t-41fc7dc97383cdea11ec2585aa3283419357529403fea3e747abb057e26222b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Fully connected neural network</topic><topic>Imbalance data processing</topic><topic>Lysine - chemistry</topic><topic>Lysine - metabolism</topic><topic>Neural Networks, Computer</topic><topic>Protein lysine crotonylation</topic><topic>Protein Processing, Post-Translational</topic><topic>Proteins - chemistry</topic><topic>Proteins - metabolism</topic><topic>Sequence analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zuo, Yun</creatorcontrib><creatorcontrib>Wan, Minquan</creatorcontrib><creatorcontrib>Shen, Yang</creatorcontrib><creatorcontrib>Wang, Xinheng</creatorcontrib><creatorcontrib>He, Wenying</creatorcontrib><creatorcontrib>Bi, Yue</creatorcontrib><creatorcontrib>Liu, Xiangrong</creatorcontrib><creatorcontrib>Deng, Zhaohong</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computational biology and chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zuo, Yun</au><au>Wan, Minquan</au><au>Shen, Yang</au><au>Wang, Xinheng</au><au>He, Wenying</au><au>Bi, Yue</au><au>Liu, Xiangrong</au><au>Deng, Zhaohong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique</atitle><jtitle>Computational biology and chemistry</jtitle><addtitle>Comput Biol Chem</addtitle><date>2024-12</date><risdate>2024</risdate><volume>113</volume><spage>108212</spage><pages>108212-</pages><artnum>108212</artnum><issn>1476-9271</issn><issn>1476-928X</issn><eissn>1476-928X</eissn><abstract>Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite. [Display omitted] •Help researchers better understand lysine crotonylation sites and associated mechanisms.•Using computational methods to predict related sites is less costly and faster.•Introduction to applications and innovations in lysine crotonylation site recognition in recent years.•Compare differences of models in the design of algorithms and feature extracted.•Discussion of shortcomings and future prospects for this topic at the intersection of biology and computing.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>39277959</pmid><doi>10.1016/j.compbiolchem.2024.108212</doi><orcidid>https://orcid.org/0009-0009-3877-8102</orcidid><orcidid>https://orcid.org/0009-0009-2452-7580</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1476-9271
ispartof Computational biology and chemistry, 2024-12, Vol.113, p.108212, Article 108212
issn 1476-9271
1476-928X
1476-928X
language eng
recordid cdi_proquest_miscellaneous_3105491112
source MEDLINE; ScienceDirect Journals (5 years ago - present)
subjects Algorithms
Fully connected neural network
Imbalance data processing
Lysine - chemistry
Lysine - metabolism
Neural Networks, Computer
Protein lysine crotonylation
Protein Processing, Post-Translational
Proteins - chemistry
Proteins - metabolism
Sequence analysis
title ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A25%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ILYCROsite:%20Identification%20of%20lysine%20crotonylation%20sites%20based%20on%20FCM-GRNN%20undersampling%20technique&rft.jtitle=Computational%20biology%20and%20chemistry&rft.au=Zuo,%20Yun&rft.date=2024-12&rft.volume=113&rft.spage=108212&rft.pages=108212-&rft.artnum=108212&rft.issn=1476-9271&rft.eissn=1476-928X&rft_id=info:doi/10.1016/j.compbiolchem.2024.108212&rft_dat=%3Cproquest_cross%3E3105491112%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3105491112&rft_id=info:pmid/39277959&rft_els_id=S1476927124002007&rfr_iscdi=true