Unsupervised group matching with application to cross-lingual topic matching without alignment information

We propose a method for unsupervised group matching, which is the task of finding correspondence between groups across different domains without cross-domain similarity measurements or paired data. For example, the proposed method can find matching of topic categories in different languages without...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2017-03, Vol.31 (2), p.350-370
Hauptverfasser:	Iwata, Tomoharu, Kanagawa, Motonobu, Hirao, Tsutomu, Fukumizu, Kenji
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment Artificial Intelligence Chemistry and Earth Sciences Computer Science Correlation analysis Correspondence Data mining Data Mining and Knowledge Discovery Hilbert space Information Storage and Retrieval Kernels Matching Mathematical analysis Methods Multilingualism Ontology Physics Probability distribution Reproduction Similarity Statistics for Engineering Tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose a method for unsupervised group matching, which is the task of finding correspondence between groups across different domains without cross-domain similarity measurements or paired data. For example, the proposed method can find matching of topic categories in different languages without alignment information. The proposed method interprets a group as a probability distribution, which enables us to handle uncertainty in a limited amount of data, and to incorporate the high order information on groups. Groups are matched by maximizing the dependence between distributions, in which we use the Hilbert Schmidt independence criterion for measuring the dependence. By using kernel embedding which maps distributions into a reproducing kernel Hilbert space, we can calculate the dependence between distributions without density estimation. In the experiments, we demonstrate the effectiveness of the proposed method using synthetic and real data sets including an application to cross-lingual topic matching.
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-016-0470-1