AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique

Plant–pathogen protein–protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of proteome research 2024-01, Vol.23 (1), p.494-499
Hauptverfasser: Lei, Chenping, Zhou, Kewei, Zheng, Jingyan, Zhao, Miao, Huang, Yan, He, Huaqin, Yang, Shiping, Zhang, Ziding
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 499
container_issue 1
container_start_page 494
container_title Journal of proteome research
container_volume 23
creator Lei, Chenping
Zhou, Kewei
Zheng, Jingyan
Zhao, Miao
Huang, Yan
He, Huaqin
Yang, Shiping
Zhang, Ziding
description Plant–pathogen protein–protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods can complement experimental efforts, but their predictive performance still needs to be improved. Motivated by the rapid development of natural language processing and its successful applications in the field of protein bioinformatics, here we present an improved XGBoost-based plant–pathogen PPI predictor (i.e., AraPathogen2.0), in which sequence encodings from the pretrained protein language model ESM2 and Arabidopsis PPI network-related node representations from the graph embedding technique struc2vec are used as input. Stringent benchmark experiments showed that AraPathogen2.0 could achieve a better performance than its precedent version, especially for processing the test data set with novel proteins unseen in the training data.
doi_str_mv 10.1021/acs.jproteome.3c00364
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2902973304</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2902973304</sourcerecordid><originalsourceid>FETCH-LOGICAL-a299t-18bdfdb6a00bacf67455d9db7ad5f4932d9604d638ff0155f6af1df13bd809c63</originalsourceid><addsrcrecordid>eNqFUc1uEzEQthCIlsIjgHzkknS8Xu-uuUVVCpGikkM5r7z2ONkqawfbW9Qbj1CJN-yT1GmSXjnNaPz96PNHyGcGUwYFu1Q6Tu92wSf0A065BuBV-YacM8HFhEuo3572RvIz8iHGOwAmauDvyRlvoJINiHPyOAtqpdLGr9EVU_hGZ44uhqx7j4auAppep9476i1dbZVLT3__neD5Obv3bn86bHThEgb1woh0Puz8H8wStHugaYP0RqUxqC1dKrce1Rr3Ahpj7N2a3qLeuP73iB_JO6u2ET8d5wX5dT2_vfoxWf78vriaLSeqkDJNWNMZa7pKAXRK26ouhTDSdLUywpaSF0ZWUJqKN9bm2MJWyjJjGe9MA1JX_IJ8PejmrNk2pnboo8ZtDol-jG0hoZA151BmqDhAdfAxBrTtLvSDCg8tg3ZfRpvLaF_LaI9lZN6Xo8XYDWheWaffzwB2ALzw_RhcTvwf0WcwIZ-S</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2902973304</pqid></control><display><type>article</type><title>AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique</title><source>American Chemical Society Journals</source><creator>Lei, Chenping ; Zhou, Kewei ; Zheng, Jingyan ; Zhao, Miao ; Huang, Yan ; He, Huaqin ; Yang, Shiping ; Zhang, Ziding</creator><creatorcontrib>Lei, Chenping ; Zhou, Kewei ; Zheng, Jingyan ; Zhao, Miao ; Huang, Yan ; He, Huaqin ; Yang, Shiping ; Zhang, Ziding</creatorcontrib><description>Plant–pathogen protein–protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods can complement experimental efforts, but their predictive performance still needs to be improved. Motivated by the rapid development of natural language processing and its successful applications in the field of protein bioinformatics, here we present an improved XGBoost-based plant–pathogen PPI predictor (i.e., AraPathogen2.0), in which sequence encodings from the pretrained protein language model ESM2 and Arabidopsis PPI network-related node representations from the graph embedding technique struc2vec are used as input. Stringent benchmark experiments showed that AraPathogen2.0 could achieve a better performance than its precedent version, especially for processing the test data set with novel proteins unseen in the training data.</description><identifier>ISSN: 1535-3893</identifier><identifier>EISSN: 1535-3907</identifier><identifier>DOI: 10.1021/acs.jproteome.3c00364</identifier><identifier>PMID: 38069805</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><ispartof>Journal of proteome research, 2024-01, Vol.23 (1), p.494-499</ispartof><rights>2023 American Chemical Society</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a299t-18bdfdb6a00bacf67455d9db7ad5f4932d9604d638ff0155f6af1df13bd809c63</cites><orcidid>0009-0001-0938-4042 ; 0000-0002-9296-571X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.3c00364$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jproteome.3c00364$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38069805$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lei, Chenping</creatorcontrib><creatorcontrib>Zhou, Kewei</creatorcontrib><creatorcontrib>Zheng, Jingyan</creatorcontrib><creatorcontrib>Zhao, Miao</creatorcontrib><creatorcontrib>Huang, Yan</creatorcontrib><creatorcontrib>He, Huaqin</creatorcontrib><creatorcontrib>Yang, Shiping</creatorcontrib><creatorcontrib>Zhang, Ziding</creatorcontrib><title>AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique</title><title>Journal of proteome research</title><addtitle>J. Proteome Res</addtitle><description>Plant–pathogen protein–protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods can complement experimental efforts, but their predictive performance still needs to be improved. Motivated by the rapid development of natural language processing and its successful applications in the field of protein bioinformatics, here we present an improved XGBoost-based plant–pathogen PPI predictor (i.e., AraPathogen2.0), in which sequence encodings from the pretrained protein language model ESM2 and Arabidopsis PPI network-related node representations from the graph embedding technique struc2vec are used as input. Stringent benchmark experiments showed that AraPathogen2.0 could achieve a better performance than its precedent version, especially for processing the test data set with novel proteins unseen in the training data.</description><issn>1535-3893</issn><issn>1535-3907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFUc1uEzEQthCIlsIjgHzkknS8Xu-uuUVVCpGikkM5r7z2ONkqawfbW9Qbj1CJN-yT1GmSXjnNaPz96PNHyGcGUwYFu1Q6Tu92wSf0A065BuBV-YacM8HFhEuo3572RvIz8iHGOwAmauDvyRlvoJINiHPyOAtqpdLGr9EVU_hGZ44uhqx7j4auAppep9476i1dbZVLT3__neD5Obv3bn86bHThEgb1woh0Puz8H8wStHugaYP0RqUxqC1dKrce1Rr3Ahpj7N2a3qLeuP73iB_JO6u2ET8d5wX5dT2_vfoxWf78vriaLSeqkDJNWNMZa7pKAXRK26ouhTDSdLUywpaSF0ZWUJqKN9bm2MJWyjJjGe9MA1JX_IJ8PejmrNk2pnboo8ZtDol-jG0hoZA151BmqDhAdfAxBrTtLvSDCg8tg3ZfRpvLaF_LaI9lZN6Xo8XYDWheWaffzwB2ALzw_RhcTvwf0WcwIZ-S</recordid><startdate>20240105</startdate><enddate>20240105</enddate><creator>Lei, Chenping</creator><creator>Zhou, Kewei</creator><creator>Zheng, Jingyan</creator><creator>Zhao, Miao</creator><creator>Huang, Yan</creator><creator>He, Huaqin</creator><creator>Yang, Shiping</creator><creator>Zhang, Ziding</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0001-0938-4042</orcidid><orcidid>https://orcid.org/0000-0002-9296-571X</orcidid></search><sort><creationdate>20240105</creationdate><title>AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique</title><author>Lei, Chenping ; Zhou, Kewei ; Zheng, Jingyan ; Zhao, Miao ; Huang, Yan ; He, Huaqin ; Yang, Shiping ; Zhang, Ziding</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a299t-18bdfdb6a00bacf67455d9db7ad5f4932d9604d638ff0155f6af1df13bd809c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lei, Chenping</creatorcontrib><creatorcontrib>Zhou, Kewei</creatorcontrib><creatorcontrib>Zheng, Jingyan</creatorcontrib><creatorcontrib>Zhao, Miao</creatorcontrib><creatorcontrib>Huang, Yan</creatorcontrib><creatorcontrib>He, Huaqin</creatorcontrib><creatorcontrib>Yang, Shiping</creatorcontrib><creatorcontrib>Zhang, Ziding</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of proteome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lei, Chenping</au><au>Zhou, Kewei</au><au>Zheng, Jingyan</au><au>Zhao, Miao</au><au>Huang, Yan</au><au>He, Huaqin</au><au>Yang, Shiping</au><au>Zhang, Ziding</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique</atitle><jtitle>Journal of proteome research</jtitle><addtitle>J. Proteome Res</addtitle><date>2024-01-05</date><risdate>2024</risdate><volume>23</volume><issue>1</issue><spage>494</spage><epage>499</epage><pages>494-499</pages><issn>1535-3893</issn><eissn>1535-3907</eissn><abstract>Plant–pathogen protein–protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods can complement experimental efforts, but their predictive performance still needs to be improved. Motivated by the rapid development of natural language processing and its successful applications in the field of protein bioinformatics, here we present an improved XGBoost-based plant–pathogen PPI predictor (i.e., AraPathogen2.0), in which sequence encodings from the pretrained protein language model ESM2 and Arabidopsis PPI network-related node representations from the graph embedding technique struc2vec are used as input. Stringent benchmark experiments showed that AraPathogen2.0 could achieve a better performance than its precedent version, especially for processing the test data set with novel proteins unseen in the training data.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>38069805</pmid><doi>10.1021/acs.jproteome.3c00364</doi><tpages>6</tpages><orcidid>https://orcid.org/0009-0001-0938-4042</orcidid><orcidid>https://orcid.org/0000-0002-9296-571X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1535-3893
ispartof Journal of proteome research, 2024-01, Vol.23 (1), p.494-499
issn 1535-3893
1535-3907
language eng
recordid cdi_proquest_miscellaneous_2902973304
source American Chemical Society Journals
title AraPathogen2.0: An Improved Prediction of Plant–Pathogen Protein–Protein Interactions Empowered by the Natural Language Processing Technique
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T16%3A12%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AraPathogen2.0:%20An%20Improved%20Prediction%20of%20Plant%E2%80%93Pathogen%20Protein%E2%80%93Protein%20Interactions%20Empowered%20by%20the%20Natural%20Language%20Processing%20Technique&rft.jtitle=Journal%20of%20proteome%20research&rft.au=Lei,%20Chenping&rft.date=2024-01-05&rft.volume=23&rft.issue=1&rft.spage=494&rft.epage=499&rft.pages=494-499&rft.issn=1535-3893&rft.eissn=1535-3907&rft_id=info:doi/10.1021/acs.jproteome.3c00364&rft_dat=%3Cproquest_cross%3E2902973304%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2902973304&rft_id=info:pmid/38069805&rfr_iscdi=true