Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding
Protein inverse folding-that is, predicting an amino acid sequence that will fold into the desired 3D structure-is an important problem for structure-based protein design. Machine learning based methods for inverse folding typically use recovery of the original sequence as the optimization objective...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Protein inverse folding-that is, predicting an amino acid sequence that will
fold into the desired 3D structure-is an important problem for structure-based
protein design. Machine learning based methods for inverse folding typically
use recovery of the original sequence as the optimization objective. However,
inverse folding is a one-to-many problem where several sequences can fold to
the same structure. Moreover, for many practical applications, it is often
desirable to have multiple, diverse sequences that fold into the target
structure since it allows for more candidate sequences for downstream
optimizations. Here, we demonstrate that although recent inverse folding
methods show increased sequence recovery, their "foldable diversity"-i.e. their
ability to generate multiple non-similar sequences that fold into the
structures consistent with the target-does not increase. To address this, we
present RL-DIF, a categorical diffusion model for inverse folding that is
pre-trained on sequence recovery and tuned via reinforcement learning on
structural consistency. We find that RL-DIF achieves comparable sequence
recovery and structural consistency to benchmark models but shows greater
foldable diversity: experiments show RL-DIF can achieve an foldable diversity
of 29% on CATH 4.2, compared to 23% from models trained on the same dataset.
The PyTorch model weights and sampling code are available on GitHub. |
---|---|
DOI: | 10.48550/arxiv.2410.17173 |