Overcoming Language Barrier for Scientific Studies via Unsupervised Literature Learning: Case Study on Solar Cell Materials Prediction

Data‐driven materials and chemical studies have been predominantly confined to English‐language databases, posing challenges for researchers in non‐English‐speaking regions to access and comprehend literature and derive scientific insights. Herein, a machine learning approach designed for informatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Solar RRL 2024-05, Vol.8 (10), p.n/a
Hauptverfasser: Zhang, Lei, He, Mu, Huang, Endai, Ma, Xiaokang, You, Jiaxue, Jen, Alex Kwan Yue, Liu, Shengzhong (Frank)
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data‐driven materials and chemical studies have been predominantly confined to English‐language databases, posing challenges for researchers in non‐English‐speaking regions to access and comprehend literature and derive scientific insights. Herein, a machine learning approach designed for information extraction and knowledge acquisition in the materials and chemical science realm from non‐English literature databases, requiring minimal human intervention, is presented. The efficacy of language model through a case study centered on the prediction of solar cell materials using Chinese‐language sources is studied. The unsupervised learning model effectively extracts crucial latent chemical and materials data from non‐English literature resources. Subsequently, the language model successfully identifies existing solar cell materials and forecasts potential candidates from this non‐English corpus. To further validate the suitability of the proposed solar cell material candidates, we conduct ab initio density functional theory calculations to evaluate their structural and optoelectronic properties. The results validate both the efficacy of our language model and the predictability of our approach. This study represents a stride toward comprehensive data‐driven machine learning for materials and chemical predictions, transcending the limitations of English literature. Furthermore, it offers a solution to aid researchers in non‐English‐speaking regions in overcoming language barriers and accessing scientific discoveries. Currently, data‐driven materials and chemical studies primarily rely on English databases, creating challenges for researchers in non‐English‐speaking countries to access and interpret literature, hindering scientific progress. Herein, a machine learning methodology tailored for scientific predictions, exemplified by the prediction of novel solar cell materials sourced from non‐English literature database, is showcased.
ISSN:2367-198X
2367-198X
DOI:10.1002/solr.202301079