Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies

•LGB-IPs is a new LGB-based optimal feature fusion model for the accurate prediction of STY phosphorylation sites.•LGB-IPs explores ten different feature descriptors and assesses its discriminative capability using five different classifiers.•Extensive cross-validation and independent assessment sho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods (San Diego, Calif.) Calif.), 2024-09, Vol.229, p.1-8
Hauptverfasser: Sabir, Mumdooh J., Kamli, Majid Rasool, Atef, Ahmed, Alhibshi, Alawiah M., Edris, Sherif, Hajarah, Nahid H., Bahieldin, Ahmed, Manavalan, Balachandran, Sabir, Jamal S.M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 8
container_issue
container_start_page 1
container_title Methods (San Diego, Calif.)
container_volume 229
creator Sabir, Mumdooh J.
Kamli, Majid Rasool
Atef, Ahmed
Alhibshi, Alawiah M.
Edris, Sherif
Hajarah, Nahid H.
Bahieldin, Ahmed
Manavalan, Balachandran
Sabir, Jamal S.M.
description •LGB-IPs is a new LGB-based optimal feature fusion model for the accurate prediction of STY phosphorylation sites.•LGB-IPs explores ten different feature descriptors and assesses its discriminative capability using five different classifiers.•Extensive cross-validation and independent assessment shows that LGB-IPs outperformed the single feature models consistently. SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.
doi_str_mv 10.1016/j.ymeth.2024.04.021
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3153655505</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1046202324001300</els_id><sourcerecordid>3057692185</sourcerecordid><originalsourceid>FETCH-LOGICAL-c272t-73bc2577fadbfc3010dfa5aad5097f25f5a72b8584b7b1506c4a3a88eb92c7763</originalsourceid><addsrcrecordid>eNqFUU1r3DAQFaWl-Wh_QaD42Is3I2ll2YcewpIvCBSatlchy6NEi225khzY_PrK2U2OLYwYzbz3ZmAeIWcUVhRodb5d7QZMjysGbL2CHIy-I8cUGlE2lMP75b-uygzzI3IS4xYAKJP1R3LEa1nVDWfH5Hnjh2lOOjk_6r6YAnbOLEXhbTE9-phf2PUveBFdwrgA9xc_7suN_12ywo0W94I5uvGhsKjTHLCwucxNPXaFn5Ib3PNhRgo64YPD-Il8sLqP-PmQT8mvq8ufm5vy7vv17ebirjRMslRK3hompLS6a63hQKGzWmjdCWikZcIKLVlbi3rdypYKqMxac13X2DbMSFnxU_J1P3cK_s-MManBRYN9r0f0c1ScCl4JIUD8nwpCVg2j9ULle6oJPsaAVk3BDTrsFAW1-KO26sUftfijIAejWfXlsGBuB-zeNK-GZMK3PQHzRZ4cBhWNw9FkW0K-s-q8--eCv4qbpKU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3057692185</pqid></control><display><type>article</type><title>Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies</title><source>Elsevier ScienceDirect Journals</source><creator>Sabir, Mumdooh J. ; Kamli, Majid Rasool ; Atef, Ahmed ; Alhibshi, Alawiah M. ; Edris, Sherif ; Hajarah, Nahid H. ; Bahieldin, Ahmed ; Manavalan, Balachandran ; Sabir, Jamal S.M.</creator><creatorcontrib>Sabir, Mumdooh J. ; Kamli, Majid Rasool ; Atef, Ahmed ; Alhibshi, Alawiah M. ; Edris, Sherif ; Hajarah, Nahid H. ; Bahieldin, Ahmed ; Manavalan, Balachandran ; Sabir, Jamal S.M.</creatorcontrib><description>•LGB-IPs is a new LGB-based optimal feature fusion model for the accurate prediction of STY phosphorylation sites.•LGB-IPs explores ten different feature descriptors and assesses its discriminative capability using five different classifiers.•Extensive cross-validation and independent assessment shows that LGB-IPs outperformed the single feature models consistently. SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.</description><identifier>ISSN: 1046-2023</identifier><identifier>ISSN: 1095-9130</identifier><identifier>EISSN: 1095-9130</identifier><identifier>DOI: 10.1016/j.ymeth.2024.04.021</identifier><identifier>PMID: 38768932</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Bioinformatics ; Light gradient boosting ; phosphorylation ; Phosphorylation sites ; prediction ; SARS-CoV-2 ; Sequence analysis ; Severe acute respiratory syndrome coronavirus 2 ; viruses</subject><ispartof>Methods (San Diego, Calif.), 2024-09, Vol.229, p.1-8</ispartof><rights>2024 Elsevier Inc.</rights><rights>Copyright © 2024 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c272t-73bc2577fadbfc3010dfa5aad5097f25f5a72b8584b7b1506c4a3a88eb92c7763</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1046202324001300$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38768932$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sabir, Mumdooh J.</creatorcontrib><creatorcontrib>Kamli, Majid Rasool</creatorcontrib><creatorcontrib>Atef, Ahmed</creatorcontrib><creatorcontrib>Alhibshi, Alawiah M.</creatorcontrib><creatorcontrib>Edris, Sherif</creatorcontrib><creatorcontrib>Hajarah, Nahid H.</creatorcontrib><creatorcontrib>Bahieldin, Ahmed</creatorcontrib><creatorcontrib>Manavalan, Balachandran</creatorcontrib><creatorcontrib>Sabir, Jamal S.M.</creatorcontrib><title>Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies</title><title>Methods (San Diego, Calif.)</title><addtitle>Methods</addtitle><description>•LGB-IPs is a new LGB-based optimal feature fusion model for the accurate prediction of STY phosphorylation sites.•LGB-IPs explores ten different feature descriptors and assesses its discriminative capability using five different classifiers.•Extensive cross-validation and independent assessment shows that LGB-IPs outperformed the single feature models consistently. SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.</description><subject>Bioinformatics</subject><subject>Light gradient boosting</subject><subject>phosphorylation</subject><subject>Phosphorylation sites</subject><subject>prediction</subject><subject>SARS-CoV-2</subject><subject>Sequence analysis</subject><subject>Severe acute respiratory syndrome coronavirus 2</subject><subject>viruses</subject><issn>1046-2023</issn><issn>1095-9130</issn><issn>1095-9130</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFUU1r3DAQFaWl-Wh_QaD42Is3I2ll2YcewpIvCBSatlchy6NEi225khzY_PrK2U2OLYwYzbz3ZmAeIWcUVhRodb5d7QZMjysGbL2CHIy-I8cUGlE2lMP75b-uygzzI3IS4xYAKJP1R3LEa1nVDWfH5Hnjh2lOOjk_6r6YAnbOLEXhbTE9-phf2PUveBFdwrgA9xc_7suN_12ywo0W94I5uvGhsKjTHLCwucxNPXaFn5Ib3PNhRgo64YPD-Il8sLqP-PmQT8mvq8ufm5vy7vv17ebirjRMslRK3hompLS6a63hQKGzWmjdCWikZcIKLVlbi3rdypYKqMxac13X2DbMSFnxU_J1P3cK_s-MManBRYN9r0f0c1ScCl4JIUD8nwpCVg2j9ULle6oJPsaAVk3BDTrsFAW1-KO26sUftfijIAejWfXlsGBuB-zeNK-GZMK3PQHzRZ4cBhWNw9FkW0K-s-q8--eCv4qbpKU</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Sabir, Mumdooh J.</creator><creator>Kamli, Majid Rasool</creator><creator>Atef, Ahmed</creator><creator>Alhibshi, Alawiah M.</creator><creator>Edris, Sherif</creator><creator>Hajarah, Nahid H.</creator><creator>Bahieldin, Ahmed</creator><creator>Manavalan, Balachandran</creator><creator>Sabir, Jamal S.M.</creator><general>Elsevier Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope></search><sort><creationdate>20240901</creationdate><title>Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies</title><author>Sabir, Mumdooh J. ; Kamli, Majid Rasool ; Atef, Ahmed ; Alhibshi, Alawiah M. ; Edris, Sherif ; Hajarah, Nahid H. ; Bahieldin, Ahmed ; Manavalan, Balachandran ; Sabir, Jamal S.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c272t-73bc2577fadbfc3010dfa5aad5097f25f5a72b8584b7b1506c4a3a88eb92c7763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bioinformatics</topic><topic>Light gradient boosting</topic><topic>phosphorylation</topic><topic>Phosphorylation sites</topic><topic>prediction</topic><topic>SARS-CoV-2</topic><topic>Sequence analysis</topic><topic>Severe acute respiratory syndrome coronavirus 2</topic><topic>viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sabir, Mumdooh J.</creatorcontrib><creatorcontrib>Kamli, Majid Rasool</creatorcontrib><creatorcontrib>Atef, Ahmed</creatorcontrib><creatorcontrib>Alhibshi, Alawiah M.</creatorcontrib><creatorcontrib>Edris, Sherif</creatorcontrib><creatorcontrib>Hajarah, Nahid H.</creatorcontrib><creatorcontrib>Bahieldin, Ahmed</creatorcontrib><creatorcontrib>Manavalan, Balachandran</creatorcontrib><creatorcontrib>Sabir, Jamal S.M.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><jtitle>Methods (San Diego, Calif.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sabir, Mumdooh J.</au><au>Kamli, Majid Rasool</au><au>Atef, Ahmed</au><au>Alhibshi, Alawiah M.</au><au>Edris, Sherif</au><au>Hajarah, Nahid H.</au><au>Bahieldin, Ahmed</au><au>Manavalan, Balachandran</au><au>Sabir, Jamal S.M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies</atitle><jtitle>Methods (San Diego, Calif.)</jtitle><addtitle>Methods</addtitle><date>2024-09-01</date><risdate>2024</risdate><volume>229</volume><spage>1</spage><epage>8</epage><pages>1-8</pages><issn>1046-2023</issn><issn>1095-9130</issn><eissn>1095-9130</eissn><abstract>•LGB-IPs is a new LGB-based optimal feature fusion model for the accurate prediction of STY phosphorylation sites.•LGB-IPs explores ten different feature descriptors and assesses its discriminative capability using five different classifiers.•Extensive cross-validation and independent assessment shows that LGB-IPs outperformed the single feature models consistently. SARS-CoV-2′s global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>38768932</pmid><doi>10.1016/j.ymeth.2024.04.021</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1046-2023
ispartof Methods (San Diego, Calif.), 2024-09, Vol.229, p.1-8
issn 1046-2023
1095-9130
1095-9130
language eng
recordid cdi_proquest_miscellaneous_3153655505
source Elsevier ScienceDirect Journals
subjects Bioinformatics
Light gradient boosting
phosphorylation
Phosphorylation sites
prediction
SARS-CoV-2
Sequence analysis
Severe acute respiratory syndrome coronavirus 2
viruses
title Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T06%3A19%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20prediction%20of%20phosphorylation%20sites%20of%20SARS-CoV-2%20infection%20using%20feature%20fusion%20and%20optimization%20strategies&rft.jtitle=Methods%20(San%20Diego,%20Calif.)&rft.au=Sabir,%20Mumdooh%20J.&rft.date=2024-09-01&rft.volume=229&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.issn=1046-2023&rft.eissn=1095-9130&rft_id=info:doi/10.1016/j.ymeth.2024.04.021&rft_dat=%3Cproquest_cross%3E3057692185%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3057692185&rft_id=info:pmid/38768932&rft_els_id=S1046202324001300&rfr_iscdi=true