Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds
This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment‐like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases). The challen...
Gespeichert in:
Veröffentlicht in: | ChemMedChem 2018-03, Vol.13 (6), p.540-554 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment‐like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment‐like chemical space can actually be built, in spite of a limited (≪105) maximal number of compounds (“frame set”) usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a “coverage check” step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure–activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real‐world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB‐17, a fragment‐like subset of GDB‐17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.
A molecular atlas: A robust generative topographic map of fragment‐like chemical space has been optimized. It accommodates more than 40 million molecules with no more than 17 heavy atoms, from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases. It serves as a library comparison tool to highlight biases in real‐world molecules versus possible species from GDB‐17. Specific patterns, proper to some libraries and absent from others, are highlighted. |
---|---|
ISSN: | 1860-7179 1860-7187 |
DOI: | 10.1002/cmdc.201700561 |