Implementing and understanding the unsupervised transfer learning in metal organic framework toward methane adsorption from hypothetical to experimental data

[Display omitted] •A new ML model is developed and implemented to transfer knowledge from a hypothetical MOF database to a synthesized MOF database.•Unlike many other TL models, the reference value on the target synthesized MOF database is no longer required, which is very useful for realistic mater...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Separation and purification technology 2024-02, Vol.330, p.125291, Article 125291
Hauptverfasser: Wei, Xin, Lu, Zhanhui, Ai, Yuejie, Shen, Lin, Wei, Mingzhi, Wang, Xiangke
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •A new ML model is developed and implemented to transfer knowledge from a hypothetical MOF database to a synthesized MOF database.•Unlike many other TL models, the reference value on the target synthesized MOF database is no longer required, which is very useful for realistic material discovery.•Both geometric and chemical descriptors, which should be combined with advanced machine learning techniques, are important to achieve a good performance. In recent years, many research groups have built a broad array of metal organic framework (MOF) materials databases with computational high-throughput screening. How to utilize the knowledge extracted from these computational datasets to study experimentally synthesized MOFs is an attractive issue. Transfer learning (TL) can solve this problem because of its superior performance in compensating for missing information in data. In this work, we adopt an unsupervised TL framework to predict the methane adsorption capability of synthesized MOFs based on a hypothetical MOF database. Unlike many reported supervised TL models, all reference values on the target synthesized MOF database are absent during TL, which is more challenging and more realistic for new material discovery. The best accuracy of TL from hMOF to CoRE MOF is 70%, and that from tobacco MOF to CoRE MOF achieves 86%. The impact of input features and loss functions on TL is discussed carefully. It can be observed that both geometric descriptors and engineering descriptors play an important role in TL, especially the set of the sure independence screening and sparsifying operator (SISSO) refined engineering descriptors usually performs better than the primary features. Besides, the feature importance analysis helps understand the biases between hypothetical and experimental databases and gives chemical insights into the application of TL. This study provides a practical prospect of machine learning in experimental data processing.
ISSN:1383-5866
1873-3794
DOI:10.1016/j.seppur.2023.125291