The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging
In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amo...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In natural language processing, the deep learning revolution has shifted the
focus from conventional hand-crafted symbolic representations to dense inputs,
which are adequate representations learned automatically from corpora. However,
particularly when working with low-resource languages, small amounts of
symbolic lexical resources such as user-generated lexicons are often available
even when gold-standard corpora are not. Such additional linguistic information
is though often neglected, and recent neural approaches to cross-lingual
tagging typically rely only on word and subword embeddings. While these
representations are effective, our recent work has shown clear benefits of
combining the best of both worlds: integrating conventional lexical information
improves neural cross-lingual part-of-speech (PoS) tagging. However, little is
known on how complementary such additional information is, and to what extent
improvements depend on the coverage and quality of these external resources.
This paper seeks to fill this gap by providing the first thorough analysis on
the contributions of lexical resources for cross-lingual PoS tagging in neural
times. |
---|---|
DOI: | 10.48550/arxiv.1811.08757 |