A BERT based Chinese Named Entity Recognition method on ASEAN News

As the first step of building a knowledge graph to record the ASEAN counties’ information, we aim to conduct Named-entity Recognition (NER) on the Chinese news about ASEAN counties. We employ a Bi-directional gated recurrent unit to replace the LSTM architecture to improve both models’ effectiveness...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2021-04, Vol.1848 (1), p.12101
Hauptverfasser: Zhuang, Haoyu, Wang, Fu, Bo, Songlin, Huang, Yongzhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As the first step of building a knowledge graph to record the ASEAN counties’ information, we aim to conduct Named-entity Recognition (NER) on the Chinese news about ASEAN counties. We employ a Bi-directional gated recurrent unit to replace the LSTM architecture to improve both models’ effectiveness and capability in understanding polysemous words. The state-of-the-art word embedding model, BERT, has also been included to generate qualified word vectors for the NER task. Besides, we also propose a similarity-based dataset partition method to help model learning the polysemy within the Chinese news. Experiments have been done to demonstrate that the combination of such improvements can benefit the models’ performance in identifying different types of named entities.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1848/1/012101