A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme
In the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an...
Gespeichert in:
Veröffentlicht in: | Applied sciences 2022-07, Vol.12 (13), p.6543 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an arbitrary target domain directly, open domain knowledge extraction technology becomes crucial to mitigate the generalization and migration ability issues. Traditional knowledge extraction models cannot be directly transferred to new domains and also cannot extract undefined relation types. In order to deal with the above issues, in this paper, we proposed an end-to-end Chinese open-domain knowledge extraction model, TPORE (Extract Open-domain Relations through Token Pair linking), which combined BERT with a handshaking tagging scheme. TPORE can alleviate the nested entities and nested relations issues. Additionally, a new loss function that conducts a pairwise comparison of target category score and non-target category score to automatically balance the weight was adopted, and the experiment results indicate that the loss function can bring speed and performance improvements. The extensive experiments demonstrate that the proposed method can significantly surpass strong baselines. Specifically, our approach can achieve new state-of-the-art Chinese open Relation Extraction (ORE) benchmarks (COER and SAOKE). In the COER dataset, F1 increased from 66.36% to 79.63%, and in the SpanSAOKE dataset, F1 increased from 46.0% to 54.91%. In the medical domain, our method can obtain close performance compared with the SOTA method in the CMeIE and CMeEE datasets. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app12136543 |