Character Eyes: Seeing Language through Character-Level Taggers
Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting toke...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Character-level models have been used extensively in recent years in NLP
tasks as both supplements and replacements for closed-vocabulary token-level
word representations. In one popular architecture, character-level LSTMs are
used to feed token representations into a sequence tagger predicting
token-level annotations such as part-of-speech (POS) tags. In this work, we
examine the behavior of POS taggers across languages from the perspective of
individual hidden units within the character LSTM. We aggregate the behavior of
these units into language-level metrics which quantify the challenges that
taggers face on languages with different morphological properties, and identify
links between synthesis and affixation preference and emergent behavior of the
hidden tagger layer. In a comparative experiment, we show how modifying the
balance between forward and backward hidden units affects model arrangement and
performance in these types of languages. |
---|---|
DOI: | 10.48550/arxiv.1903.05041 |