ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling
Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Protein language models have demonstrated significant potential in the field
of protein engineering. However, current protein language models primarily
operate at the residue scale, which limits their ability to provide information
at the atom level. This limitation prevents us from fully exploiting the
capabilities of protein language models for applications involving both
proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom),
a novel approach that enables atom-scale and residue-scale unified molecular
modeling. ESM-AA achieves this by pre-training on multi-scale code-switch
protein sequences and utilizing a multi-scale position encoding to capture
relationships among residues and atoms. Experimental results indicate that
ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the
full utilization of protein language models. Further investigations reveal that
through unified molecular modeling, ESM-AA not only gains molecular knowledge
but also retains its understanding of proteins. The source codes of ESM-AA are
publicly released at https://github.com/zhengkangjie/ESM-AA. |
---|---|
DOI: | 10.48550/arxiv.2403.12995 |