Identifying domains of applicability of machine learning models for materials science

Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2020-09, Vol.11 (1), p.4428-4428, Article 4428
Hauptverfasser: Sutton, Christopher, Boley, Mario, Ghiringhelli, Luca M., Rupp, Matthias, Vreeken, Jilles, Scheffler, Matthias
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance. Machine learning models insufficient for certain screening tasks can still provide valuable predictions in specific sub-domains of the considered materials. Here, the authors introduce a diagnostic tool to detect regions of low expected model error as demonstrated for the case of transparent conducting oxides.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-17112-9