Cost-Effective Knowledge Extraction Framework for Low-Resource Environments

Extracting knowledge from texts is crucial for enriching everyday knowledge. Constructing a knowledge extraction environment requires comprehensive processes, such as data generation, data processing, and model and framework design. However, these processes require significant effort in low-resource...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser:	Nam, Sangha, Kim, Eun-kyung
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Crowdsourcing Data mining Data models Data processing Information retrieval Knowledge Knowledge Base Knowledge based systems Knowledge Extraction Low-resource Environment Performance enhancement Quality control Task analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Extracting knowledge from texts is crucial for enriching everyday knowledge. Constructing a knowledge extraction environment requires comprehensive processes, such as data generation, data processing, and model and framework design. However, these processes require significant effort in low-resource environments where shared data are not published. Currently, there is no environment that can design an entire knowledge extraction framework and perform step-by-step experiments even with unlimited resources. Thus, this study proposes a method for building a cost-effective knowledge extraction environment. In particular, we present a low-cost, high-quality method for annotating a corpus for knowledge extraction, in which data sharing is unavailable. The dataset collected using this method improves the performance of knowledge-extraction system models. Specifically, the co-reference resolution and relation extraction performance were improved by 10% and 18.9%, respectively. Additionally, the entire knowledge extraction system was evaluated using sequential multitask learning, and the performance was improved by 5% as each trained model was introduced.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3394906