De novo protein design using geometric vector field networks
Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise featu...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Innovations like protein diffusion have enabled significant progress in de
novo protein design, which is a vital topic in life science. These methods
typically depend on protein structure encoders to model residue backbone
frames, where atoms do not exist. Most prior encoders rely on atom-wise
features, such as angles and distances between atoms, which are not available
in this context. Thus far, only several simple encoders, such as IPA, have been
proposed for this scenario, exposing the frame modeling as a bottleneck. In
this work, we proffer the Vector Field Network (VFN), which enables network
layers to perform learnable vector computations between coordinates of
frame-anchored virtual atoms, thus achieving a higher capability for modeling
frames. The vector computation operates in a manner similar to a linear layer,
with each input channel receiving 3D virtual atom coordinates instead of scalar
values. The multiple feature vectors output by the vector computation are then
used to update the residue representations and virtual atom coordinates via
attention aggregation. Remarkably, VFN also excels in modeling both frames and
atoms, as the real atoms can be treated as the virtual atoms for modeling,
positioning VFN as a potential universal encoder. In protein diffusion (frame
modeling), VFN exhibits an impressive performance advantage over IPA, excelling
in terms of both designability (67.04% vs. 53.58%) and diversity (66.54% vs.
51.98%). In inverse folding (frame and atom modeling), VFN outperforms the
previous SoTA model, PiFold (54.7% vs. 51.66%), on sequence recovery rate. We
also propose a method of equipping VFN with the ESM model, which significantly
surpasses the previous ESM-based SoTA (62.67% vs. 55.65%), LM-Design, by a
substantial margin. |
---|---|
DOI: | 10.48550/arxiv.2310.11802 |