Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations

Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2024-07, Vol.15 (1), p.6170-15, Article 6170
Hauptverfasser: Diaz, Daniel J., Gong, Chengyue, Ouyang-Zhang, Jeffrey, Loy, James M., Wells, Jordan, Yang, David, Ellington, Andrew D., Dimakis, Alexandros G., Klivans, Adam R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies. Engineering stabilized proteins is essential for industrial and pharmaceutical biotechnologies. Here, authors present Stability Oracle, a Graph-Transformer framework trained on protein masked microenvironments to predict protein thermodynamic stability, using less training data while achieving improved generalization.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-49780-2