A GPT-assisted iterative method for extracting domain knowledge from a large volume of literature of electromagnetic wave absorbing materials with limited manually annotated data

[Display omitted] •Presents a training method to address limited data in absorbing materials.•Develops a knowledge extraction framework to process extensive literature.•Explored applications of extraction like trend analysis and knowledge graph. Research on electromagnetic wave absorbing materials i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational materials science 2025-01, Vol.246, p.113431, Article 113431
Hauptverfasser: Dai, Dongbo, Zhang, Guangjie, Wei, Xiao, Lin, Yudian, Dai, Mengmeng, Peng, Junjie, Song, Na, Tang, Zheng, Li, Shengzhou, Liu, Jiwei, Xu, Yan, Che, Renchao, Zhang, Huiran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Presents a training method to address limited data in absorbing materials.•Develops a knowledge extraction framework to process extensive literature.•Explored applications of extraction like trend analysis and knowledge graph. Research on electromagnetic wave absorbing materials is an important part of materials science. Each year, a substantial amount of academic literature is published in this field, containing critical information. Rapid and effective knowledge extraction from these documents is key to accelerating field development, and automated knowledge extraction based on deep learning provides a solution to this challenge. However, deep learning models typically require extensive annotated data for training, which is time-consuming and expensive to obtain in highly specialized subfields. To address this issue, this paper presents a GPT-assisted iterative training method that uses only 30 manually annotated literature abstracts as a training set and ultimately achieves an F1 score of 82.94% for a named entity recognition model (NER). The effectiveness of this model is demonstrated by comparing with other large language models commonly used in materials science on a custom dataset. We constructed a knowledge extraction framework centered around the obtained NER model and collected literature on electromagnetic wave absorbing materials from the last decade. The extraction and application results demonstrate the practicality of our framework.
ISSN:0927-0256
DOI:10.1016/j.commatsci.2024.113431