Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions
This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For th...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper describes our submission to the WMT20 sentence filtering task. We
combine scores from (1) a custom LASER built for each source language, (2) a
classifier built to distinguish positive and negative pairs by semantic
alignment, and (3) the original scores included in the task devkit. For the
mBART finetuning setup, provided by the organizers, our method shows 7% and 5%
relative improvement over baseline, in sacreBLEU score on the test set for
Pashto and Khmer respectively. |
---|---|
DOI: | 10.48550/arxiv.2011.07933 |