An Ensemble Structure and Physiochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties
Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional predi...
Gespeichert in:
Veröffentlicht in: | Chemphyschem 2022-07, p.e202200255 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well-structured chemical databases with machine learning tasks varying from regression to classifications. |
---|---|
ISSN: | 1439-7641 |
DOI: | 10.1002/cphc.202200255 |