Using Natural Alignment to Extract Translation Equivalents
Most methods to extract bilingual lexicons from parallel corpora learn word correspondences using relative small aligned segments, called sentences. Then, they need to get a corpus aligned at the sentence level. Such an alignment can require further manual corrections if the parallel corpus contains...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Most methods to extract bilingual lexicons from parallel corpora learn word correspondences using relative small aligned segments, called sentences. Then, they need to get a corpus aligned at the sentence level. Such an alignment can require further manual corrections if the parallel corpus contains insertions, deletions, or fuzzy sentence boundaries. This paper shows that it is possible to extract bilingual lexicons without aligning parallel texts at the sentence level. We describe a method to learn word translations from a very roughly aligned corpus, namely a corpus with quite long segments separated by “natural boundaries”. The results obtained using this method are very close to those obtained using sentence alignment. Some experiments were performed on English-Portuguese and English-Spanish parallel texts. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/11751984_5 |