Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

Artificial intelligence (AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fund...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of earth science (Wuhan, China) China), 2023-10, Vol.34 (5), p.1406-1417
Hauptverfasser: Qiu, Qinjun, Tian, Miao, Xie, Zhong, Tan, Yongjian, Ma, Kai, Wang, Qingfang, Pan, Shengyong, Tao, Liufeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Artificial intelligence (AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bi-directional Encoder Representations from Transformers (ALBERT)- Bi-directional Long Short-Term Memory (BiLSTM)-Conditional Random Fields (CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.
ISSN:1674-487X
1867-111X
DOI:10.1007/s12583-022-1789-8