Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms

Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of -grams in the test set. However, intuitively, more ambiguous n-grams require mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Prague bulletin of mathematical linguistics 2017-06, Vol.108 (1), p.245-256
Hauptverfasser:	Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Alignment Ambiguity Computation Czech language Decay English language Entropy German language Machine translation N-Gram language models Pipelines Translation methods and strategies Translations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of -grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambiguity can be measured by alignment entropy. In this paper we propose two methods for calculating the alignment entropies for -grams of any size, which can be used for improving the performance of FDA. We evaluate the substitution of the n-gram-specific entropy values computed by these methods to the parameters of both the exponential and linear decay factor of FDA. The experiments conducted on German-to-English and Czech-to-English translation demonstrate that the use of alignment entropies can lead to an increase in the quality of the results of FDA.
ISSN:	1804-0462 0032-6585 1804-0462
DOI:	10.1515/pralin-2017-0024