MIC-SHAP: An ensemble feature selection method for materials machine learning

Feature selection has kept playing a significant role in the workflow of materials machine learning, but currently most of works of materials machine learning tend to use single or stepwise feature selection methods. A new ensemble feature selection method named MIC-SHAP was proposed in this work, w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Materials today communications 2023-12, Vol.37, p.106910, Article 106910
Hauptverfasser: Wang, Junya, Xu, Pengcheng, Ji, Xiaobo, Li, Minjie, Lu, Wencong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Feature selection has kept playing a significant role in the workflow of materials machine learning, but currently most of works of materials machine learning tend to use single or stepwise feature selection methods. A new ensemble feature selection method named MIC-SHAP was proposed in this work, which combines the SHapley Additive exPlanations (SHAP) method and the maximal information coefficient (MIC) method. The effectiveness of the ensemble feature selection method was evaluated with three different material datasets collected from publications. The results have demonstrated that MIC-SHAP method outperforms the commonly used feature selection methods, guaranteeing the prediction accuracy and greatly reducing the model complexity. The highest feature reduction rate is 91.67%, while the R2 of the 10-fold cross-validation reaches 0.98. The MIC-SHAP method could quickly select the optimal feature subset effectively, avoiding repeated attempts of different feature selection methods. Moreover, the MIC-SHAP method could increase the stability and interpretability of feature selection to help the subsequent process of materials design and discovery. [Display omitted] •An ensemble feature selection is proposed for materials machine learning.•Our aim was ensuring the stability and interpretability of feature subset.•Three datasets, six feature selection methods and seven algorithms are used in results comparison.•Experimental results and statistical hypothesis test confirmed the superiority of our method.•The proposed method measures feature importance from different perspectives.
ISSN:2352-4928
2352-4928
DOI:10.1016/j.mtcomm.2023.106910