Chemical SuperLearner (ChemSL) - An automated machine learning framework for building physical and chemical properties model

This study introduces Chemical SuperLearner (ChemSL), a novel automated framework for building interpretable machine-learning models that predict molecular properties from chemical representations. The ChemSL framework helps build a suitable combination of molecular representation and a data-driven...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemical engineering science 2024-07, Vol.294, p.120111, Article 120111
Hauptverfasser: Mohan, Balaji, Chang, Junseok
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study introduces Chemical SuperLearner (ChemSL), a novel automated framework for building interpretable machine-learning models that predict molecular properties from chemical representations. The ChemSL framework helps build a suitable combination of molecular representation and a data-driven SuperLearner, a stacked ensemble model from a pool of 40 base learners. The top-ranked base learners are ensembled using weights by a meta learner. Three regression benchmark datasets (ESOL, FreeSolv, Lipophilicity) from MoleculeNet were used to compare the performance of the ChemSL-generated models against the models available in the literature. The ChemSL-generated models achieved superior performance while maintaining interpretability. Finally, the ChemSL framework's applicability was demonstrated using the Yield Sooting Index (YSI) database from Harvard Dataverse. The model developed showed excellent predictive capabilities, highlighting its potential as a powerful tool for researchers in various fields, including cheminformatics, materials science, drug discovery, and fuel design. •An automated workflow to build ML-enhanced QSPR models for properties.•The framework was demonstrated on ESOL, Freesolv, and Lipophilicity datasets.•The models showed the best accuracies.•The SHAP package was used to showcase the model's explainability.•The framework was also demonstrated on the YSI database with good accuracy.
ISSN:0009-2509
1873-4405
DOI:10.1016/j.ces.2024.120111