ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zheng, Kangjie, Long, Siyu, Lu, Tianyu, Yang, Junwei, Dai, Xinyu, Zhang, Ming, Nie, Zaiqing, Ma, Wei-Ying, Zhou, Hao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zheng, Kangjie
Long, Siyu
Lu, Tianyu
Yang, Junwei
Dai, Xinyu
Zhang, Ming
Nie, Zaiqing
Ma, Wei-Ying
Zhou, Hao
description Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.
doi_str_mv 10.48550/arxiv.2403.12995
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_12995</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_12995</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2403_129953</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1jM0srQ05WTwdw32VXDMydF1LMnPtVLwLc0pydQtTk7MSVUIKMovSc3MU_BJzEsvTUxPVfDNT0nNUUjLL1IIzctMy0xNAYrkpCaX5iQWQeQy89J5GFjTEnOKU3mhNDeDvJtriLOHLtjq-IKizNzEosp4kBPiwU4wJqwCAAeAOxY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling</title><source>arXiv.org</source><creator>Zheng, Kangjie ; Long, Siyu ; Lu, Tianyu ; Yang, Junwei ; Dai, Xinyu ; Zhang, Ming ; Nie, Zaiqing ; Ma, Wei-Ying ; Zhou, Hao</creator><creatorcontrib>Zheng, Kangjie ; Long, Siyu ; Lu, Tianyu ; Yang, Junwei ; Dai, Xinyu ; Zhang, Ming ; Nie, Zaiqing ; Ma, Wei-Ying ; Zhou, Hao</creatorcontrib><description>Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.</description><identifier>DOI: 10.48550/arxiv.2403.12995</identifier><language>eng</language><subject>Computer Science - Computational Engineering, Finance, and Science ; Computer Science - Learning ; Quantitative Biology - Biomolecules</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.12995$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.12995$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zheng, Kangjie</creatorcontrib><creatorcontrib>Long, Siyu</creatorcontrib><creatorcontrib>Lu, Tianyu</creatorcontrib><creatorcontrib>Yang, Junwei</creatorcontrib><creatorcontrib>Dai, Xinyu</creatorcontrib><creatorcontrib>Zhang, Ming</creatorcontrib><creatorcontrib>Nie, Zaiqing</creatorcontrib><creatorcontrib>Ma, Wei-Ying</creatorcontrib><creatorcontrib>Zhou, Hao</creatorcontrib><title>ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling</title><description>Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.</description><subject>Computer Science - Computational Engineering, Finance, and Science</subject><subject>Computer Science - Learning</subject><subject>Quantitative Biology - Biomolecules</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1jM0srQ05WTwdw32VXDMydF1LMnPtVLwLc0pydQtTk7MSVUIKMovSc3MU_BJzEsvTUxPVfDNT0nNUUjLL1IIzctMy0xNAYrkpCaX5iQWQeQy89J5GFjTEnOKU3mhNDeDvJtriLOHLtjq-IKizNzEosp4kBPiwU4wJqwCAAeAOxY</recordid><startdate>20240305</startdate><enddate>20240305</enddate><creator>Zheng, Kangjie</creator><creator>Long, Siyu</creator><creator>Lu, Tianyu</creator><creator>Yang, Junwei</creator><creator>Dai, Xinyu</creator><creator>Zhang, Ming</creator><creator>Nie, Zaiqing</creator><creator>Ma, Wei-Ying</creator><creator>Zhou, Hao</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20240305</creationdate><title>ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling</title><author>Zheng, Kangjie ; Long, Siyu ; Lu, Tianyu ; Yang, Junwei ; Dai, Xinyu ; Zhang, Ming ; Nie, Zaiqing ; Ma, Wei-Ying ; Zhou, Hao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2403_129953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computational Engineering, Finance, and Science</topic><topic>Computer Science - Learning</topic><topic>Quantitative Biology - Biomolecules</topic><toplevel>online_resources</toplevel><creatorcontrib>Zheng, Kangjie</creatorcontrib><creatorcontrib>Long, Siyu</creatorcontrib><creatorcontrib>Lu, Tianyu</creatorcontrib><creatorcontrib>Yang, Junwei</creatorcontrib><creatorcontrib>Dai, Xinyu</creatorcontrib><creatorcontrib>Zhang, Ming</creatorcontrib><creatorcontrib>Nie, Zaiqing</creatorcontrib><creatorcontrib>Ma, Wei-Ying</creatorcontrib><creatorcontrib>Zhou, Hao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zheng, Kangjie</au><au>Long, Siyu</au><au>Lu, Tianyu</au><au>Yang, Junwei</au><au>Dai, Xinyu</au><au>Zhang, Ming</au><au>Nie, Zaiqing</au><au>Ma, Wei-Ying</au><au>Zhou, Hao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling</atitle><date>2024-03-05</date><risdate>2024</risdate><abstract>Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.</abstract><doi>10.48550/arxiv.2403.12995</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2403.12995
ispartof
issn
language eng
recordid cdi_arxiv_primary_2403_12995
source arXiv.org
subjects Computer Science - Computational Engineering, Finance, and Science
Computer Science - Learning
Quantitative Biology - Biomolecules
title ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T07%3A35%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESM%20All-Atom:%20Multi-scale%20Protein%20Language%20Model%20for%20Unified%20Molecular%20Modeling&rft.au=Zheng,%20Kangjie&rft.date=2024-03-05&rft_id=info:doi/10.48550/arxiv.2403.12995&rft_dat=%3Carxiv_GOX%3E2403_12995%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true