CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets

Materials science research benefits from the powerful machine-learning (ML) surrogate models, but it is also limited by the implicit requirement for sufficiently big and balanced data distribution for ML. In this paper, we propose a model to obtain more credible results for small and imbalanced mate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemistry letters 2024-05, Vol.53 (5)
Hauptverfasser: Li, Shengzhou, Nakata, Ayako
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Materials science research benefits from the powerful machine-learning (ML) surrogate models, but it is also limited by the implicit requirement for sufficiently big and balanced data distribution for ML. In this paper, we propose a model to obtain more credible results for small and imbalanced materials data sets as well as chemical knowledge. Taking 2 bandgaps imbalanced data sets as instances, we demonstrate the usability and performance of our model compared with common ML models with normal sampling and resampling methods.
ISSN:0366-7022
1348-0715
DOI:10.1093/chemle/upae090