An Ensemble Structure and Physiochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties

Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional predi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemphyschem 2022-07, p.e202200255
Hauptverfasser: Yang, Qi, Liu, Yidi, Cheng, Junjie, Li, Yao, Liu, Siyuan, Duan, Yingdong, Zhang, Long, Luo, Sanzhong
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Feature representations, or descriptors, are machines' chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well-structured chemical databases with machine learning tasks varying from regression to classifications.
ISSN:1439-7641
DOI:10.1002/cphc.202200255