Jointly Embedding Protein Structures and Sequences through Residue Level Alignment
The relationships between protein sequences, structures, and functions are determined by complex codes that scientists aim to decipher. While structures contain key information about proteins' biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequenc...
Gespeichert in:
Veröffentlicht in: | PRX Life 2024-11, Vol.2 (4), Article 043013 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The relationships between protein sequences, structures, and functions are determined by complex codes that scientists aim to decipher. While structures contain key information about proteins' biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequences are abundant but are a step removed from function. In this paper, we propose residue level alignment (RLA)—a self-supervised objective for aligning sequence and structure embedding spaces. By situating sequence and structure encoders within the same latent space, RLA enriches the sequence encoder with spatial information. Moreover, our framework enables us to measure the similarity between a sequence and structure by comparing their RLA embeddings. We show how RLA similarity scores can be used for binder design by selecting true binders from sets of designed binders. RLA scores are informative even when they are calculated given only the backbone structure of the binder and no binder sequence information, which simulates the information available in many early-stage binder design libraries. RLA performs similarly to benchmark methods and is orders of magnitude faster, making it a valuable new screening tool for binder design pipelines. |
---|---|
ISSN: | 2835-8279 2835-8279 |
DOI: | 10.1103/PRXLife.2.043013 |