Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Protein-ligand structure prediction is an essential task in drug discovery,
predicting the binding interactions between small molecules (ligands) and
target proteins (receptors). Recent advances have incorporated deep learning
techniques to improve the accuracy of protein-ligand structure prediction.
Nevertheless, the experimental validation of docking conformations remains
costly, it raises concerns regarding the generalizability of these deep
learning-based methods due to the limited training data. In this work, we show
that by pre-training on a large-scale docking conformation generated by
traditional physics-based docking tools and then fine-tuning with a limited set
of experimentally validated receptor-ligand complexes, we can obtain a
protein-ligand structure prediction model with outstanding performance.
Specifically, this process involved the generation of 100 million docking
conformations for protein-ligand pairings, an endeavor consuming roughly 1
million CPU core days. The proposed model, HelixDock, aims to acquire the
physical knowledge encapsulated by the physics-based docking tools during the
pre-training phase. HelixDock has been rigorously benchmarked against both
physics-based and deep learning-based baselines, demonstrating its exceptional
precision and robust transferability in predicting binding confirmation. In
addition, our investigation reveals the scaling laws governing pre-trained
protein-ligand structure prediction models, indicating a consistent enhancement
in performance with increases in model parameters and the volume of
pre-training data. Moreover, we applied HelixDock to several drug
discovery-related tasks to validate its practical utility. HelixDock
demonstrates outstanding capabilities on both cross-docking and structure-based
virtual screening benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2310.13913 |