Ensemble Distillation Approaches for Grammatical Error Correction
Ensemble approaches are commonly used techniques to improving a system by combining multiple model predictions. Additionally these schemes allow the uncertainty, as well as the source of the uncertainty, to be derived for the prediction. Unfortunately these benefits come at a computational and memor...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Ensemble approaches are commonly used techniques to improving a system by
combining multiple model predictions. Additionally these schemes allow the
uncertainty, as well as the source of the uncertainty, to be derived for the
prediction. Unfortunately these benefits come at a computational and memory
cost. To address this problem ensemble distillation (EnD) and more recently
ensemble distribution distillation (EnDD) have been proposed that compress the
ensemble into a single model, representing either the ensemble average
prediction or prediction distribution respectively. This paper examines the
application of both these distillation approaches to a sequence prediction
task, grammatical error correction (GEC). This is an important application area
for language learning tasks as it can yield highly useful feedback to the
learner. It is, however, more challenging than the standard tasks investigated
for distillation as the prediction of any grammatical correction to a word will
be highly dependent on both the input sequence and the generated output history
for the word. The performance of both EnD and EnDD are evaluated on both
publicly available GEC tasks as well as a spoken language task. |
---|---|
DOI: | 10.48550/arxiv.2012.07535 |