Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data

Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence rel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Prague bulletin of mathematical linguistics 2014-04, Vol.101 (1), p.43-54
Hauptverfasser: de Buy Wenniger, Gideon Maillette, Sima’an, Khalil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence relations extracted from word alignments in data. In this paper we present such a toolkit and exemplify some of its uses. The main challenges taken up in designing this tool are the efficient and compact, yet complete, representation of hierarchical translation equivalence coupled with an intuitive visualization of these hierarchical relations. We exploit a new hierarchical representation, called Hierarchical Alignment Trees (HATs), which is based on an extension of the algorithms used for factorizing n-ary branching SCFG rules into their minimally-branching equivalents. Our toolkit further provides a search capability based on hierarchically relevant properties of word alignments and/or translation equivalence relations. Finally, the tool allows detailed statistical analysis of word alignments, thereby providing a breakdown of alignment statistics according to the complexity of translation equivalence units or reordering phenomena. We illustrate this with an empirical study of the coverage of inversion-transduction grammars for a number of corpora enriched with manual or automatic word alignments, followed by a breakdown of corpus statistics to reordering complexity.
ISSN:0032-6585
1804-0462
DOI:10.2478/pralin-2014-0003