Accurate prediction of band gap of materials using stacking machine learning model

[Display omitted] •An extensive dataset (E-AFLOW) containing 21,534 compounds with 206 dimensions is established for band gap prediction.•The stacking approach is applied to rapidly and accurately predict material band gaps.•The stacking model outperforms 10 baseline models in band gap prediction.•T...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational materials science 2022-01, Vol.201, p.110899, Article 110899
Hauptverfasser: Wang, Teng, Zhang, Kefei, Thé, Jesse, Yu, Hesheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •An extensive dataset (E-AFLOW) containing 21,534 compounds with 206 dimensions is established for band gap prediction.•The stacking approach is applied to rapidly and accurately predict material band gaps.•The stacking model outperforms 10 baseline models in band gap prediction.•The trained stacking model using our E-AFLOW dataset can accurately predict the band gaps of new materials. The prediction of the band gap of semiconductor materials using machine learning has gradually progressed in recent years. However, the performance of such prediction still needs further optimization. This work applies the stacking approach, which fuses the output of multiple baseline models, to further enhance the performance of band gap regression. Ten baseline models are optimized to predict the band gap of materials. Afterwards, the output of models with relatively better performance is used as the input features of the stacking approach. This research employed a benchmark dataset containing 3896 inorganic compounds, with 136 dimensions, and a newly established complex database (E-AFLOW), containing 21,534 compounds with 206 dimensions, to prove the effectiveness of different models. The trained stacking model based on the E-AFLOW database is then applied to determine the band gaps of different new compounds. The results demonstrate that the stacking model has the highest R2 value, at 0.920, in benchmark dataset and a value of 0.917 in the E-AFLOW dataset, with 5-flod cross validation. For the E-AFLOW dataset, the improvement percentage of RMSE, MAE, MAPE, and R2 of the stacking model to GBDT, XGB, RF, and LGB input baseline models are between 3.06%–17.54%, 8.12%–33.25%, 7.69%–33.33%, and 0.66%-4.44%, respectively. In real applications, the trained stacking model based on the E-AFLOW dataset can predict the band gaps of 78.57% of new materials within ± 8.00% of observed measurements. The minimum deviation between the predicted and observed values is −0.02%, and the maximum is 14.27%. These results convincingly demonstrate the excellent performance of stacking approach in band gap regression.
ISSN:0927-0256
1879-0801
DOI:10.1016/j.commatsci.2021.110899