Domain Adaptation of Machine Translation with Crowdworkers
Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, ther...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Although a machine translation model trained with a large in-domain parallel
corpus achieves remarkable results, it still works poorly when no in-domain
data are available. This situation restricts the applicability of machine
translation when the target domain's data are limited. However, there is great
demand for high-quality domain-specific machine translation models for many
domains. We propose a framework that efficiently and effectively collects
parallel sentences in a target domain from the web with the help of
crowdworkers. With the collected parallel data, we can quickly adapt a machine
translation model to the target domain. Our experiments show that the proposed
method can collect target-domain parallel data over a few days at a reasonable
cost. We tested it with five domains, and the domain-adapted model improved the
BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose
translation model. |
---|---|
DOI: | 10.48550/arxiv.2210.15861 |