Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds

This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment‐like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases). The challen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ChemMedChem 2018-03, Vol.13 (6), p.540-554
Hauptverfasser: Lin, Arkadii, Horvath, Dragos, Afonina, Valentina, Marcou, Gilles, Reymond, Jean‐Louis, Varnek, Alexandre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment‐like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment‐like chemical space can actually be built, in spite of a limited (≪105) maximal number of compounds (“frame set”) usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a “coverage check” step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure–activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real‐world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB‐17, a fragment‐like subset of GDB‐17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted. A molecular atlas: A robust generative topographic map of fragment‐like chemical space has been optimized. It accommodates more than 40 million molecules with no more than 17 heavy atoms, from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases. It serves as a library comparison tool to highlight biases in real‐world molecules versus possible species from GDB‐17. Specific patterns, proper to some libraries and absent from others, are highlighted.
ISSN:1860-7179
1860-7187
DOI:10.1002/cmdc.201700561