Decoding functional proteome information in model organisms using protein language models

Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected mode...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:NAR genomics and bioinformatics 2024-09, Vol.6 (3), p.lqae078
Hauptverfasser: Barrios-Núñez, Israel, Martínez-Redondo, Gemma I, Medina-Burgos, Patricia, Cases, Ildefonso, Fernández, Rosa, Rojas, Ana M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 3
container_start_page lqae078
container_title NAR genomics and bioinformatics
container_volume 6
creator Barrios-Núñez, Israel
Martínez-Redondo, Gemma I
Medina-Burgos, Patricia
Cases, Ildefonso
Fernández, Rosa
Rojas, Ana M
description Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use. Graphical Abstract Graphical Abstract
doi_str_mv 10.1093/nargab/lqae078
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11217674</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/nargab/lqae078</oup_id><sourcerecordid>3075703665</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-c6e13819c754ee0c44a5935a770d717cbea0234d46f83be860ef7d8e2cef43763</originalsourceid><addsrcrecordid>eNqFkT1PwzAQhi0EolXpyogywpDWjr-SCaHyKVVigYHJcp1LCErs1k6Q-PekpFRlYrrT3XPv3elF6JzgGcEZnVvtS72a1xsNWKZHaJwISuIsEenxQT5C0xA-MMYJZ5xhcopGNM1EknA-Rm-3YFxe2TIqOmvaylldR2vvWnANRJUtnG_0ttznUeNyqCPXL7VVaELUhe3gD913a23LTpcwYOEMnRS6DjDdxQl6vb97WTzGy-eHp8XNMjY0xW1sBBCaksxIzgCwYUzzjHItJc4lkWYFGieU5UwUKV1BKjAUMk8hMVAwKgWdoOtBd92tGsgN2NbrWq191Wj_pZyu1N-Ord5V6T4VIQmRQrJe4XKn4N2mg9CqpgoG6v4hcF1QFEsuMRWC9-hsQI13IXgo9nsIVltP1OCJ2nnSD1wcXrfHfx3ogasBcN36P7FvvOma-w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3075703665</pqid></control><display><type>article</type><title>Decoding functional proteome information in model organisms using protein language models</title><source>Oxford Journals Open Access Collection</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><creator>Barrios-Núñez, Israel ; Martínez-Redondo, Gemma I ; Medina-Burgos, Patricia ; Cases, Ildefonso ; Fernández, Rosa ; Rojas, Ana M</creator><creatorcontrib>Barrios-Núñez, Israel ; Martínez-Redondo, Gemma I ; Medina-Burgos, Patricia ; Cases, Ildefonso ; Fernández, Rosa ; Rojas, Ana M</creatorcontrib><description>Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use. Graphical Abstract Graphical Abstract</description><identifier>ISSN: 2631-9268</identifier><identifier>EISSN: 2631-9268</identifier><identifier>DOI: 10.1093/nargab/lqae078</identifier><identifier>PMID: 38962255</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Editor's Choice</subject><ispartof>NAR genomics and bioinformatics, 2024-09, Vol.6 (3), p.lqae078</ispartof><rights>The Author(s) 2024. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2024</rights><rights>The Author(s) 2024. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c380t-c6e13819c754ee0c44a5935a770d717cbea0234d46f83be860ef7d8e2cef43763</cites><orcidid>0000-0003-0750-9099</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11217674/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11217674/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,1598,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38962255$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Barrios-Núñez, Israel</creatorcontrib><creatorcontrib>Martínez-Redondo, Gemma I</creatorcontrib><creatorcontrib>Medina-Burgos, Patricia</creatorcontrib><creatorcontrib>Cases, Ildefonso</creatorcontrib><creatorcontrib>Fernández, Rosa</creatorcontrib><creatorcontrib>Rojas, Ana M</creatorcontrib><title>Decoding functional proteome information in model organisms using protein language models</title><title>NAR genomics and bioinformatics</title><addtitle>NAR Genom Bioinform</addtitle><description>Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use. Graphical Abstract Graphical Abstract</description><subject>Editor's Choice</subject><issn>2631-9268</issn><issn>2631-9268</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><recordid>eNqFkT1PwzAQhi0EolXpyogywpDWjr-SCaHyKVVigYHJcp1LCErs1k6Q-PekpFRlYrrT3XPv3elF6JzgGcEZnVvtS72a1xsNWKZHaJwISuIsEenxQT5C0xA-MMYJZ5xhcopGNM1EknA-Rm-3YFxe2TIqOmvaylldR2vvWnANRJUtnG_0ttznUeNyqCPXL7VVaELUhe3gD913a23LTpcwYOEMnRS6DjDdxQl6vb97WTzGy-eHp8XNMjY0xW1sBBCaksxIzgCwYUzzjHItJc4lkWYFGieU5UwUKV1BKjAUMk8hMVAwKgWdoOtBd92tGsgN2NbrWq191Wj_pZyu1N-Ord5V6T4VIQmRQrJe4XKn4N2mg9CqpgoG6v4hcF1QFEsuMRWC9-hsQI13IXgo9nsIVltP1OCJ2nnSD1wcXrfHfx3ogasBcN36P7FvvOma-w</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Barrios-Núñez, Israel</creator><creator>Martínez-Redondo, Gemma I</creator><creator>Medina-Burgos, Patricia</creator><creator>Cases, Ildefonso</creator><creator>Fernández, Rosa</creator><creator>Rojas, Ana M</creator><general>Oxford University Press</general><scope>TOX</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-0750-9099</orcidid></search><sort><creationdate>20240901</creationdate><title>Decoding functional proteome information in model organisms using protein language models</title><author>Barrios-Núñez, Israel ; Martínez-Redondo, Gemma I ; Medina-Burgos, Patricia ; Cases, Ildefonso ; Fernández, Rosa ; Rojas, Ana M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-c6e13819c754ee0c44a5935a770d717cbea0234d46f83be860ef7d8e2cef43763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Editor's Choice</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Barrios-Núñez, Israel</creatorcontrib><creatorcontrib>Martínez-Redondo, Gemma I</creatorcontrib><creatorcontrib>Medina-Burgos, Patricia</creatorcontrib><creatorcontrib>Cases, Ildefonso</creatorcontrib><creatorcontrib>Fernández, Rosa</creatorcontrib><creatorcontrib>Rojas, Ana M</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>NAR genomics and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Barrios-Núñez, Israel</au><au>Martínez-Redondo, Gemma I</au><au>Medina-Burgos, Patricia</au><au>Cases, Ildefonso</au><au>Fernández, Rosa</au><au>Rojas, Ana M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decoding functional proteome information in model organisms using protein language models</atitle><jtitle>NAR genomics and bioinformatics</jtitle><addtitle>NAR Genom Bioinform</addtitle><date>2024-09-01</date><risdate>2024</risdate><volume>6</volume><issue>3</issue><spage>lqae078</spage><pages>lqae078-</pages><issn>2631-9268</issn><eissn>2631-9268</eissn><abstract>Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use. Graphical Abstract Graphical Abstract</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>38962255</pmid><doi>10.1093/nargab/lqae078</doi><orcidid>https://orcid.org/0000-0003-0750-9099</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2631-9268
ispartof NAR genomics and bioinformatics, 2024-09, Vol.6 (3), p.lqae078
issn 2631-9268
2631-9268
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11217674
source Oxford Journals Open Access Collection; DOAJ Directory of Open Access Journals; PubMed Central
subjects Editor's Choice
title Decoding functional proteome information in model organisms using protein language models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T20%3A07%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decoding%20functional%20proteome%20information%20in%20model%20organisms%20using%20protein%20language%20models&rft.jtitle=NAR%20genomics%20and%20bioinformatics&rft.au=Barrios-N%C3%BA%C3%B1ez,%20Israel&rft.date=2024-09-01&rft.volume=6&rft.issue=3&rft.spage=lqae078&rft.pages=lqae078-&rft.issn=2631-9268&rft.eissn=2631-9268&rft_id=info:doi/10.1093/nargab/lqae078&rft_dat=%3Cproquest_pubme%3E3075703665%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3075703665&rft_id=info:pmid/38962255&rft_oup_id=10.1093/nargab/lqae078&rfr_iscdi=true