Activity Cliff Prediction: Dataset and Benchmark
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a p...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Activity cliffs (ACs), which are generally defined as pairs of structurally
similar molecules that are active against the same bio-target but significantly
different in the binding potency, are of great importance to drug discovery. Up
to date, the AC prediction problem, i.e., to predict whether a pair of
molecules exhibit the AC relationship, has not yet been fully explored. In this
paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet
curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including
over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model
development and evaluation. Then, we propose a baseline framework to benchmark
the predictive performance of molecular representations encoded by deep neural
networks for AC prediction, and 16 models are evaluated in experiments. Our
experimental results show that deep learning models can achieve good
performance when the models are trained on tasks with adequate amount of data,
while the imbalanced, low-data and out-of-distribution features of the ACNet
dataset still make it challenging for deep neural networks to cope with. In
addition, the traditional ECFP method shows a natural advantage on MMP-cliff
prediction, and outperforms other deep learning models on most of the data
subsets. To the best of our knowledge, our work constructs the first
large-scale dataset for AC prediction, which may stimulate the study of AC
prediction models and prompt further breakthroughs in AI-aided drug discovery.
The codes and dataset can be accessed by https://drugai.github.io/ACNet/. |
---|---|
DOI: | 10.48550/arxiv.2302.07541 |