Soft Gazetteers for Low-Resource Named Entity Recognition
Traditional named entity recognition models use gazetteers (lists of entities) as features to improve performance. Although modern neural network models do not require such hand-crafted features for strong performance, recent work has demonstrated their utility for named entity recognition on Englis...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Traditional named entity recognition models use gazetteers (lists of
entities) as features to improve performance. Although modern neural network
models do not require such hand-crafted features for strong performance, recent
work has demonstrated their utility for named entity recognition on English
data. However, designing such features for low-resource languages is
challenging, because exhaustive entity gazetteers do not exist in these
languages. To address this problem, we propose a method of "soft gazetteers"
that incorporates ubiquitously available information from English knowledge
bases, such as Wikipedia, into neural named entity recognition models through
cross-lingual entity linking. Our experiments on four low-resource languages
show an average improvement of 4 points in F1 score. Code and data are
available at https://github.com/neulab/soft-gazetteers. |
---|---|
DOI: | 10.48550/arxiv.2005.01866 |