Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model

Abstract Motivation O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues wit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2024-11, Vol.40 (11)
Hauptverfasser: Pakhrin, Subash Chandra, Chauhan, Neha, Khan, Salman, Upadhyaya, Jamie, Beck, Moriah Rene, Blanco, Eduardo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 11
container_start_page
container_title Bioinformatics (Oxford, England)
container_volume 40
creator Pakhrin, Subash Chandra
Chauhan, Neha
Khan, Salman
Upadhyaya, Jamie
Beck, Moriah Rene
Blanco, Eduardo
description Abstract Motivation O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model’s embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites. Results Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins. Availability and implementation The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.
doi_str_mv 10.1093/bioinformatics/btae643
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11552629</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btae643</oup_id><sourcerecordid>3120593667</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-c3f71a8b17ad89e1fa8ebba84331b4d059dae94609754dc551fa44182f500de13</originalsourceid><addsrcrecordid>eNqNkc1rFTEUxYNYbG39F0rAjZuxycvHzKxEil9QqIt2HTLJnWlqJhmTTOG59g837XuW1pWb3MD93cM5HIROKXlPSc_OBhddGGOadXEmnw1Fg-TsBTqiTLYN7yh9-eR_iF7nfEsIEUTIV-iQ9Zy3RPRH6Pf3BNaZ4mLAccQ366wDvmy8Cz_A4slvTcxbrx_22RXIeM0uTDgXbR4ICJC0d792iA4WwzyAtRXKeExxxkuCpiTtQsWXFAu4gL0O06onwHO04E_Qwah9hjf7eYyuP3-6Ov_aXFx--Xb-8aIxjPJS37Gluhtoq23XAx11B8OgO84YHbiteayGnkvSt4JbI0QlOKfdZhSEWKDsGH3Y6S7rMIM1EKovr5bkZp22Kmqnnm-Cu1FTvFOUCrGRm74qvNsrpPhzhVzU7LIBX_NAXLNidFNtMCnbir79B72Nawo13z3VEcoEk5WSO8qkmHOC8dENJeq-afW8abVvuh6ePs3yePa32grQHRDX5X9F_wBb-MBW</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3128013536</pqid></control><display><type>article</type><title>Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Pakhrin, Subash Chandra ; Chauhan, Neha ; Khan, Salman ; Upadhyaya, Jamie ; Beck, Moriah Rene ; Blanco, Eduardo</creator><contributor>Gao, Xin</contributor><creatorcontrib>Pakhrin, Subash Chandra ; Chauhan, Neha ; Khan, Salman ; Upadhyaya, Jamie ; Beck, Moriah Rene ; Blanco, Eduardo ; Gao, Xin</creatorcontrib><description>Abstract Motivation O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model’s embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites. Results Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins. Availability and implementation The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.</description><identifier>ISSN: 1367-4811</identifier><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btae643</identifier><identifier>PMID: 39447059</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Accuracy ; Availability ; Biological activity ; Computational Biology - methods ; Correlation coefficient ; Correlation coefficients ; Databases, Protein ; Datasets ; Embedding ; Glycosylation ; Humans ; Multilayer perceptrons ; Multilayers ; Neural Networks, Computer ; Original Paper ; Oxygen atoms ; Post-translation ; Predictions ; Protein Processing, Post-Translational ; Proteins ; Proteins - chemistry ; Proteins - metabolism ; Proteomics ; Residues ; Serine ; Software ; Threonine</subject><ispartof>Bioinformatics (Oxford, England), 2024-11, Vol.40 (11)</ispartof><rights>The Author(s) 2024. Published by Oxford University Press. 2024</rights><rights>The Author(s) 2024. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c314t-c3f71a8b17ad89e1fa8ebba84331b4d059dae94609754dc551fa44182f500de13</cites><orcidid>0009-0009-3310-2939</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552629/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552629/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,1604,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39447059$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Gao, Xin</contributor><creatorcontrib>Pakhrin, Subash Chandra</creatorcontrib><creatorcontrib>Chauhan, Neha</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Upadhyaya, Jamie</creatorcontrib><creatorcontrib>Beck, Moriah Rene</creatorcontrib><creatorcontrib>Blanco, Eduardo</creatorcontrib><title>Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model’s embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites. Results Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins. Availability and implementation The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.</description><subject>Accuracy</subject><subject>Availability</subject><subject>Biological activity</subject><subject>Computational Biology - methods</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Databases, Protein</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Glycosylation</subject><subject>Humans</subject><subject>Multilayer perceptrons</subject><subject>Multilayers</subject><subject>Neural Networks, Computer</subject><subject>Original Paper</subject><subject>Oxygen atoms</subject><subject>Post-translation</subject><subject>Predictions</subject><subject>Protein Processing, Post-Translational</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteins - metabolism</subject><subject>Proteomics</subject><subject>Residues</subject><subject>Serine</subject><subject>Software</subject><subject>Threonine</subject><issn>1367-4811</issn><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkc1rFTEUxYNYbG39F0rAjZuxycvHzKxEil9QqIt2HTLJnWlqJhmTTOG59g837XuW1pWb3MD93cM5HIROKXlPSc_OBhddGGOadXEmnw1Fg-TsBTqiTLYN7yh9-eR_iF7nfEsIEUTIV-iQ9Zy3RPRH6Pf3BNaZ4mLAccQ366wDvmy8Cz_A4slvTcxbrx_22RXIeM0uTDgXbR4ICJC0d792iA4WwzyAtRXKeExxxkuCpiTtQsWXFAu4gL0O06onwHO04E_Qwah9hjf7eYyuP3-6Ov_aXFx--Xb-8aIxjPJS37Gluhtoq23XAx11B8OgO84YHbiteayGnkvSt4JbI0QlOKfdZhSEWKDsGH3Y6S7rMIM1EKovr5bkZp22Kmqnnm-Cu1FTvFOUCrGRm74qvNsrpPhzhVzU7LIBX_NAXLNidFNtMCnbir79B72Nawo13z3VEcoEk5WSO8qkmHOC8dENJeq-afW8abVvuh6ePs3yePa32grQHRDX5X9F_wBb-MBW</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Pakhrin, Subash Chandra</creator><creator>Chauhan, Neha</creator><creator>Khan, Salman</creator><creator>Upadhyaya, Jamie</creator><creator>Beck, Moriah Rene</creator><creator>Blanco, Eduardo</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0009-0009-3310-2939</orcidid></search><sort><creationdate>20241101</creationdate><title>Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model</title><author>Pakhrin, Subash Chandra ; Chauhan, Neha ; Khan, Salman ; Upadhyaya, Jamie ; Beck, Moriah Rene ; Blanco, Eduardo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-c3f71a8b17ad89e1fa8ebba84331b4d059dae94609754dc551fa44182f500de13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Availability</topic><topic>Biological activity</topic><topic>Computational Biology - methods</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Databases, Protein</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Glycosylation</topic><topic>Humans</topic><topic>Multilayer perceptrons</topic><topic>Multilayers</topic><topic>Neural Networks, Computer</topic><topic>Original Paper</topic><topic>Oxygen atoms</topic><topic>Post-translation</topic><topic>Predictions</topic><topic>Protein Processing, Post-Translational</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteins - metabolism</topic><topic>Proteomics</topic><topic>Residues</topic><topic>Serine</topic><topic>Software</topic><topic>Threonine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pakhrin, Subash Chandra</creatorcontrib><creatorcontrib>Chauhan, Neha</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Upadhyaya, Jamie</creatorcontrib><creatorcontrib>Beck, Moriah Rene</creatorcontrib><creatorcontrib>Blanco, Eduardo</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pakhrin, Subash Chandra</au><au>Chauhan, Neha</au><au>Khan, Salman</au><au>Upadhyaya, Jamie</au><au>Beck, Moriah Rene</au><au>Blanco, Eduardo</au><au>Gao, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2024-11-01</date><risdate>2024</risdate><volume>40</volume><issue>11</issue><issn>1367-4811</issn><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Abstract Motivation O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model’s embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites. Results Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins. Availability and implementation The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>39447059</pmid><doi>10.1093/bioinformatics/btae643</doi><orcidid>https://orcid.org/0009-0009-3310-2939</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4811
ispartof Bioinformatics (Oxford, England), 2024-11, Vol.40 (11)
issn 1367-4811
1367-4803
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11552629
source MEDLINE; DOAJ Directory of Open Access Journals; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects Accuracy
Availability
Biological activity
Computational Biology - methods
Correlation coefficient
Correlation coefficients
Databases, Protein
Datasets
Embedding
Glycosylation
Humans
Multilayer perceptrons
Multilayers
Neural Networks, Computer
Original Paper
Oxygen atoms
Post-translation
Predictions
Protein Processing, Post-Translational
Proteins
Proteins - chemistry
Proteins - metabolism
Proteomics
Residues
Serine
Software
Threonine
title Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T16%3A44%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20human%20O-linked%20glycosylation%20sites%20using%20stacked%20generalization%20and%20embeddings%20from%20pre-trained%20protein%20language%20model&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Pakhrin,%20Subash%20Chandra&rft.date=2024-11-01&rft.volume=40&rft.issue=11&rft.issn=1367-4811&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btae643&rft_dat=%3Cproquest_pubme%3E3120593667%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3128013536&rft_id=info:pmid/39447059&rft_oup_id=10.1093/bioinformatics/btae643&rfr_iscdi=true