Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools

The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of proteome research 2024-12, Vol.23 (12), p.5395-5404
Hauptverfasser: Degnan, David J., Strauch, Clayton W., Obiri, Moses Y., VonKaenel, Erik D., Kim, Grace S., Kershaw, James D., Novelli, David L., Pazdernik, Karl TL, Bramer, Lisa M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5404
container_issue 12
container_start_page 5395
container_title Journal of proteome research
container_volume 23
creator Degnan, David J.
Strauch, Clayton W.
Obiri, Moses Y.
VonKaenel, Erik D.
Kim, Grace S.
Kershaw, James D.
Novelli, David L.
Pazdernik, Karl TL
Bramer, Lisa M.
description The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.
doi_str_mv 10.1021/acs.jproteome.4c00535
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3128750986</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3128750986</sourcerecordid><originalsourceid>FETCH-LOGICAL-a229t-1c9f66869906e586545dd21b01d70ceab5b05c55c3e8bdde29c5fcf6cd5f27cb3</originalsourceid><addsrcrecordid>eNqFkLtSwzAQRTUMDIHAJ8CopHGQ7Mi2SgivzJhAEWqPLK2Dgy0FyYaBin_gD_kSFPJoqXaLc-_OHoROKBlQEtJzId1gvrCmBdPAYCgJYRHbQQfUjyDiJNnd7CmPeujQuTkhlCUk2ke9iLMwTofDA_T5uKyo9M_X93rDY92CFbKtjMYTaN-NfXH4Cmz1BgqX1jR4VAvnKilqLLTC90I-VxpwBsLqSs-CS-E8ORFtZz2SCT3rxAyw75fgc3qGp8bU7gjtlaJ2cLyeffR0cz0d3QXZw-14dJEFIgx5G1DJyzhOY85JDCyN2ZApFdKCUJUQCaJgBWGSMRlBWigFIZeslGUsFSvDRBZRH52ter2t1w5cmzeVk1DXQoPpXB7RME0Y4WnsUbZCpTXOWSjzha0aYT9ySvKl9txrz7fa87V2nztdn-iKBtQ2tfHsAboC_vKms9p__E_pL7ihlvA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3128750986</pqid></control><display><type>article</type><title>Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools</title><source>MEDLINE</source><source>American Chemical Society Journals</source><creator>Degnan, David J. ; Strauch, Clayton W. ; Obiri, Moses Y. ; VonKaenel, Erik D. ; Kim, Grace S. ; Kershaw, James D. ; Novelli, David L. ; Pazdernik, Karl TL ; Bramer, Lisa M.</creator><creatorcontrib>Degnan, David J. ; Strauch, Clayton W. ; Obiri, Moses Y. ; VonKaenel, Erik D. ; Kim, Grace S. ; Kershaw, James D. ; Novelli, David L. ; Pazdernik, Karl TL ; Bramer, Lisa M.</creatorcontrib><description>The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.</description><identifier>ISSN: 1535-3893</identifier><identifier>ISSN: 1535-3907</identifier><identifier>EISSN: 1535-3907</identifier><identifier>DOI: 10.1021/acs.jproteome.4c00535</identifier><identifier>PMID: 39526844</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>COVID-19 - metabolism ; COVID-19 - virology ; Data Mining - methods ; Databases, Protein ; Humans ; Machine Learning ; Natural Language Processing ; Protein Interaction Mapping - methods ; Protein Interaction Maps ; SARS-CoV-2 - metabolism</subject><ispartof>Journal of proteome research, 2024-12, Vol.23 (12), p.5395-5404</ispartof><rights>2024 American Chemical Society</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a229t-1c9f66869906e586545dd21b01d70ceab5b05c55c3e8bdde29c5fcf6cd5f27cb3</cites><orcidid>0000-0002-3990-5662 ; 0000-0002-8384-1926 ; 0000-0001-5737-7173 ; 0000-0002-8933-7413 ; 0009-0006-3585-8690</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.4c00535$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jproteome.4c00535$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39526844$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Degnan, David J.</creatorcontrib><creatorcontrib>Strauch, Clayton W.</creatorcontrib><creatorcontrib>Obiri, Moses Y.</creatorcontrib><creatorcontrib>VonKaenel, Erik D.</creatorcontrib><creatorcontrib>Kim, Grace S.</creatorcontrib><creatorcontrib>Kershaw, James D.</creatorcontrib><creatorcontrib>Novelli, David L.</creatorcontrib><creatorcontrib>Pazdernik, Karl TL</creatorcontrib><creatorcontrib>Bramer, Lisa M.</creatorcontrib><title>Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools</title><title>Journal of proteome research</title><addtitle>J. Proteome Res</addtitle><description>The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.</description><subject>COVID-19 - metabolism</subject><subject>COVID-19 - virology</subject><subject>Data Mining - methods</subject><subject>Databases, Protein</subject><subject>Humans</subject><subject>Machine Learning</subject><subject>Natural Language Processing</subject><subject>Protein Interaction Mapping - methods</subject><subject>Protein Interaction Maps</subject><subject>SARS-CoV-2 - metabolism</subject><issn>1535-3893</issn><issn>1535-3907</issn><issn>1535-3907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkLtSwzAQRTUMDIHAJ8CopHGQ7Mi2SgivzJhAEWqPLK2Dgy0FyYaBin_gD_kSFPJoqXaLc-_OHoROKBlQEtJzId1gvrCmBdPAYCgJYRHbQQfUjyDiJNnd7CmPeujQuTkhlCUk2ke9iLMwTofDA_T5uKyo9M_X93rDY92CFbKtjMYTaN-NfXH4Cmz1BgqX1jR4VAvnKilqLLTC90I-VxpwBsLqSs-CS-E8ORFtZz2SCT3rxAyw75fgc3qGp8bU7gjtlaJ2cLyeffR0cz0d3QXZw-14dJEFIgx5G1DJyzhOY85JDCyN2ZApFdKCUJUQCaJgBWGSMRlBWigFIZeslGUsFSvDRBZRH52ter2t1w5cmzeVk1DXQoPpXB7RME0Y4WnsUbZCpTXOWSjzha0aYT9ySvKl9txrz7fa87V2nztdn-iKBtQ2tfHsAboC_vKms9p__E_pL7ihlvA</recordid><startdate>20241206</startdate><enddate>20241206</enddate><creator>Degnan, David J.</creator><creator>Strauch, Clayton W.</creator><creator>Obiri, Moses Y.</creator><creator>VonKaenel, Erik D.</creator><creator>Kim, Grace S.</creator><creator>Kershaw, James D.</creator><creator>Novelli, David L.</creator><creator>Pazdernik, Karl TL</creator><creator>Bramer, Lisa M.</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3990-5662</orcidid><orcidid>https://orcid.org/0000-0002-8384-1926</orcidid><orcidid>https://orcid.org/0000-0001-5737-7173</orcidid><orcidid>https://orcid.org/0000-0002-8933-7413</orcidid><orcidid>https://orcid.org/0009-0006-3585-8690</orcidid></search><sort><creationdate>20241206</creationdate><title>Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools</title><author>Degnan, David J. ; Strauch, Clayton W. ; Obiri, Moses Y. ; VonKaenel, Erik D. ; Kim, Grace S. ; Kershaw, James D. ; Novelli, David L. ; Pazdernik, Karl TL ; Bramer, Lisa M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a229t-1c9f66869906e586545dd21b01d70ceab5b05c55c3e8bdde29c5fcf6cd5f27cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>COVID-19 - metabolism</topic><topic>COVID-19 - virology</topic><topic>Data Mining - methods</topic><topic>Databases, Protein</topic><topic>Humans</topic><topic>Machine Learning</topic><topic>Natural Language Processing</topic><topic>Protein Interaction Mapping - methods</topic><topic>Protein Interaction Maps</topic><topic>SARS-CoV-2 - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Degnan, David J.</creatorcontrib><creatorcontrib>Strauch, Clayton W.</creatorcontrib><creatorcontrib>Obiri, Moses Y.</creatorcontrib><creatorcontrib>VonKaenel, Erik D.</creatorcontrib><creatorcontrib>Kim, Grace S.</creatorcontrib><creatorcontrib>Kershaw, James D.</creatorcontrib><creatorcontrib>Novelli, David L.</creatorcontrib><creatorcontrib>Pazdernik, Karl TL</creatorcontrib><creatorcontrib>Bramer, Lisa M.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of proteome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Degnan, David J.</au><au>Strauch, Clayton W.</au><au>Obiri, Moses Y.</au><au>VonKaenel, Erik D.</au><au>Kim, Grace S.</au><au>Kershaw, James D.</au><au>Novelli, David L.</au><au>Pazdernik, Karl TL</au><au>Bramer, Lisa M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools</atitle><jtitle>Journal of proteome research</jtitle><addtitle>J. Proteome Res</addtitle><date>2024-12-06</date><risdate>2024</risdate><volume>23</volume><issue>12</issue><spage>5395</spage><epage>5404</epage><pages>5395-5404</pages><issn>1535-3893</issn><issn>1535-3907</issn><eissn>1535-3907</eissn><abstract>The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>39526844</pmid><doi>10.1021/acs.jproteome.4c00535</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-3990-5662</orcidid><orcidid>https://orcid.org/0000-0002-8384-1926</orcidid><orcidid>https://orcid.org/0000-0001-5737-7173</orcidid><orcidid>https://orcid.org/0000-0002-8933-7413</orcidid><orcidid>https://orcid.org/0009-0006-3585-8690</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1535-3893
ispartof Journal of proteome research, 2024-12, Vol.23 (12), p.5395-5404
issn 1535-3893
1535-3907
1535-3907
language eng
recordid cdi_proquest_miscellaneous_3128750986
source MEDLINE; American Chemical Society Journals
subjects COVID-19 - metabolism
COVID-19 - virology
Data Mining - methods
Databases, Protein
Humans
Machine Learning
Natural Language Processing
Protein Interaction Mapping - methods
Protein Interaction Maps
SARS-CoV-2 - metabolism
title Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T23%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Protein%E2%80%93Protein%20Interaction%20Networks%20Derived%20from%20Classical%20and%20Machine%20Learning-Based%20Natural%20Language%20Processing%20Tools&rft.jtitle=Journal%20of%20proteome%20research&rft.au=Degnan,%20David%20J.&rft.date=2024-12-06&rft.volume=23&rft.issue=12&rft.spage=5395&rft.epage=5404&rft.pages=5395-5404&rft.issn=1535-3893&rft.eissn=1535-3907&rft_id=info:doi/10.1021/acs.jproteome.4c00535&rft_dat=%3Cproquest_cross%3E3128750986%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3128750986&rft_id=info:pmid/39526844&rfr_iscdi=true