Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship
The goal of protein representation learning is to extract knowledge from protein databases that can be applied to various protein-related downstream tasks. Although protein sequence, structure, and function are the three key modalities for a comprehensive understanding of proteins, existing methods...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The goal of protein representation learning is to extract knowledge from
protein databases that can be applied to various protein-related downstream
tasks. Although protein sequence, structure, and function are the three key
modalities for a comprehensive understanding of proteins, existing methods for
protein representation learning have utilized only one or two of these
modalities due to the difficulty of capturing the asymmetric interrelationships
between them. To account for this asymmetry, we introduce our novel asymmetric
multi-modal masked autoencoder (AMMA). AMMA adopts (1) a unified multi-modal
encoder to integrate all three modalities into a unified representation space
and (2) asymmetric decoders to ensure that sequence latent features reflect
structural and functional information. The experiments demonstrate that the
proposed AMMA is highly effective in learning protein representations that
exhibit well-aligned inter-modal relationships, which in turn makes it
effective for various downstream protein-related tasks. |
---|---|
DOI: | 10.48550/arxiv.2405.06663 |