Cross-lingual document similarity estimation and dictionary generation with comparable corpora

This paper proposes an approach for performing bilingual dictionary generation even when trained on widely available comparable bilingual corpora. We also show its capability to provide cross-lingual similarity estimates that correlate well with human judgments. We implement an approach using a nonl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems 2019-03, Vol.58 (3), p.729-743
Hauptverfasser:	Štajner, Tadej, Mladenić, Dunja
Format:	Artikel
Sprache:	eng
Schlagworte:	Bilingualism Computer Science Data Mining and Knowledge Discovery Database Management Dictionaries Information Storage and Retrieval Information systems Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Judgments Language Short Paper Similarity Translations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes an approach for performing bilingual dictionary generation even when trained on widely available comparable bilingual corpora. We also show its capability to provide cross-lingual similarity estimates that correlate well with human judgments. We implement an approach using a nonlinear bilingual translation model that we train using comparable corpora. We propose a method using word embeddings and kernel approximation to train scalable nonlinear transformations. We demonstrate that this novel method works better on a majority of evaluated language pairs.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-018-1179-9