Machine Learned Model for Solid Form Volume Estimation Based on Packing-Accessible Surface and Molecular Topological Fragments

We present a machine learned model for predicting the volume of a homomolecular crystal based on the single-molecule structure, implemented in the open-source Python package for Molecular Volume Estimation (PyMoVE). The model is based on two descriptors: the volume enclosed by the packing-accessible...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory Molecules, spectroscopy, kinetics, environment, & general theory, 2020-12, Vol.124 (49), p.10330-10345
Hauptverfasser: Bier, Imanuel, Marom, Noa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present a machine learned model for predicting the volume of a homomolecular crystal based on the single-molecule structure, implemented in the open-source Python package for Molecular Volume Estimation (PyMoVE). The model is based on two descriptors: the volume enclosed by the packing-accessible surface and molecular topological fragments. To calculate the volume enclosed by the molecular surface, we have developed a new “projected marching cubes” algorithm. The new algorithm achieves a higher accuracy with a smaller number of elements than the traditional marching cubes algorithm, the marching tetrahedron variant, and Monte Carlo methods. The packing-accessible surface is then calculated using an optimized probe radius. The molecular topological fragments are used to construct a representation that captures the bonding environments of the atoms in the molecule. Feature selection is used to determine which fragments to include in the model. The accuracy and robustness of the model may be attributed to including both geometric and chemical features. The volume enclosed by the packing-accessible surface accounts for the presence of voids and sterically hindered regions as well as for the effect of conformational changes. The molecular topological fragments account for the effect of intermolecular interactions on the packing density. The model is trained on a dataset of structures extracted from the Cambridge Structural Database. Excellent performance is demonstrated for three validation sets of unseen data.
ISSN:1089-5639
1520-5215
DOI:10.1021/acs.jpca.0c06791