Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration
Diacritics restoration has become a ubiquitous task in the Latin-alphabet-based English-dominated Internet language environment. In this paper, we describe a small footprint 1D dilated convolution-based approach which operates on a character-level. We find that solutions based on 1D dilated convolut...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diacritics restoration has become a ubiquitous task in the
Latin-alphabet-based English-dominated Internet language environment. In this
paper, we describe a small footprint 1D dilated convolution-based approach
which operates on a character-level. We find that solutions based on 1D dilated
convolutional neural networks are competitive alternatives to models based on
recursive neural networks or linguistic modeling for the task of diacritics
restoration. Our solution surpasses the performance of similarly sized models
and is also competitive with larger models. A special feature of our solution
is that it even runs locally in a web browser. We also provide a working
example of this browser-based implementation. Our model is evaluated on
different corpora, with emphasis on the Hungarian language. We performed
comparative measurements about the generalization power of the model in
relation to three Hungarian corpora. We also analyzed the errors to understand
the limitation of corpus-based self-supervised training. |
---|---|
DOI: | 10.48550/arxiv.2201.06757 |