Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features for the modeling of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. We compare this pretraining strategy with two others: one based...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We evaluate the impact of pretraining Graph Transformer architectures on
atom-level quantum-mechanical features for the modeling of absorption,
distribution, metabolism, excretion, and toxicity (ADMET) properties of
drug-like compounds. We compare this pretraining strategy with two others: one
based on molecular quantum properties (specifically the HOMO-LUMO gap) and one
using a self-supervised atom masking technique. After fine-tuning on
Therapeutic Data Commons ADMET datasets, we evaluate the performance
improvement in the different models observing that models pretrained with
atomic quantum mechanical properties produce in general better results. We then
analyse the latent representations and observe that the supervised strategies
preserve the pretraining information after finetuning and that different
pretrainings produce different trends in latent expressivity across layers.
Furthermore, we find that models pretrained on atomic quantum mechanical
properties capture more low-frequency laplacian eigenmodes of the input graph
via the attention weights and produce better representations of atomic
environments within the molecule. Application of the analysis to a much larger
non-public dataset for microsomal clearance illustrates generalizability of the
studied indicators. In this case the performances of the models are in
accordance with the representation analysis and highlight, especially for the
case of masking pretraining and atom-level quantum property pretraining, how
model types with similar performance on public benchmarks can have different
performances on large scale pharmaceutical data. |
---|---|
DOI: | 10.48550/arxiv.2410.08024 |