Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemical information and modeling 2020-09, Vol.60 (9), p.4311-4325
Hauptverfasser:	Ghanakota, Phani, Bos, Pieter H, Konze, Kyle D, Staker, Joshua, Marques, Gabriel, Marshall, Kyle, Leswing, Karl, Abel, Robert, Bhat, Sathesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational Chemistry Coupling (molecular) Cyclin-dependent kinases Enrichment Enumeration Free energy Kinases Machine learning Mathematical analysis Optimization Perturbation Reagents Screens Workflow
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based enumeration, active learning, and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based free energy perturbation (FEP) profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of predefined drug-like property space. We can achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR-based multiparameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can (1) provide a 6.4-fold enrichment improvement in identifying
ISSN:	1549-9596 1549-960X
DOI:	10.1021/acs.jcim.0c00120