Question-answering Forestry Pre-trained Language Model: ForestBERT

【Objective】As for the problems of low utilization of forestry text, insufficient understanding of forestry knowledge by general-domain pre-trained language models, and the time-consuming nature of data annotation, this study makes full use of the massive forestry texts, proposes a pre-trained langua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Linye kexue (1979) 2024-01, Vol.60 (9), p.99
Hauptverfasser:	Jingwei, Tan, Huaiqing, Zhang, Yang, Liu, Jie, Yang, Dongping, Zheng
Format:	Artikel
Sprache:	chi
Schlagworte:	Annotations Datasets Decision making Forest management Forestry Forestry laws Information management Information services Language Machine learning Questions Self-supervised learning Terminology Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	【Objective】As for the problems of low utilization of forestry text, insufficient understanding of forestry knowledge by general-domain pre-trained language models, and the time-consuming nature of data annotation, this study makes full use of the massive forestry texts, proposes a pre-trained language model integrating forestry domain knowledge, and efficiently realizes the forestry extractive question answering by automatically annotating the training data, so as to provide intelligent information services for forestry decision-making and management.【Method】First, a forestry corpus was constructed using web crawler technology, encompassing three topics: terminology, law, and literature. This corpus was used to further pre-train the generaldomain pre-trained language model BERT. Through self-supervised learning of masked language model and next sentence prediction tasks, BERT was able to effectively learn forestry semantic information, resulting in the pre-trained language model ForestBERT, which has general
ISSN:	1001-7488