A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme

In the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2022-07, Vol.12 (13), p.6543
Hauptverfasser:	Yang, Ning, Pun, Sio Hang, Vai, Mang I, Yang, Yifan, Miao, Qingliang
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmarks BERT Classification Datasets Deep learning Domains Electronic health records Electronic medical records fixed-domain relation extraction handshaking tagging scheme Information processing Knowledge knowledge graph Marking Methods named entity recognition Neural networks open-domain relation extraction Patients Physicians Signs and symptoms Tagging Technology Unstructured data Yao Ming Yi Jianlian
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an arbitrary target domain directly, open domain knowledge extraction technology becomes crucial to mitigate the generalization and migration ability issues. Traditional knowledge extraction models cannot be directly transferred to new domains and also cannot extract undefined relation types. In order to deal with the above issues, in this paper, we proposed an end-to-end Chinese open-domain knowledge extraction model, TPORE (Extract Open-domain Relations through Token Pair linking), which combined BERT with a handshaking tagging scheme. TPORE can alleviate the nested entities and nested relations issues. Additionally, a new loss function that conducts a pairwise comparison of target category score and non-target category score to automatically balance the weight was adopted, and the experiment results indicate that the loss function can bring speed and performance improvements. The extensive experiments demonstrate that the proposed method can significantly surpass strong baselines. Specifically, our approach can achieve new state-of-the-art Chinese open Relation Extraction (ORE) benchmarks (COER and SAOKE). In the COER dataset, F1 increased from 66.36% to 79.63%, and in the SpanSAOKE dataset, F1 increased from 46.0% to 54.91%. In the medical domain, our method can obtain close performance compared with the SOTA method in the CMeIE and CMeEE datasets.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app12136543