The PARIS Algorithm for Determining Latent Topics

We introduce a new method for discovering latent topics in sets of objects, such as documents. Our method, which we call PARIS (for Principal Atoms Recognition In Sets), aims to detect principal sets of elements, representing latent topics in the data, that tend to appear frequently together. These...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aharon, M, Cohen, I, Itskovitch, A, Marhaim, I, Banner, R
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We introduce a new method for discovering latent topics in sets of objects, such as documents. Our method, which we call PARIS (for Principal Atoms Recognition In Sets), aims to detect principal sets of elements, representing latent topics in the data, that tend to appear frequently together. These latent topics, which we refer to as `atoms', are used as the basis for clustering, classification, collaborative filtering, and more. We develop a target function which balances compression and low error of representation, and the algorithm which minimizes the function. Optimization of the target function enables an automatic discovery of the number of atoms, representing the dimensionality of the data, and the atoms themselves, all in a single iterative procedure. We demonstrate PARIS's ability to discover latent topics, even when those are arranged hierarchically, on synthetic, documents and movie ranking data, showing improved performance compared to existing algorithms, such as LDA, on text analysis and collaborative filtering tasks.
ISSN:2375-9232
2375-9259
DOI:10.1109/ICDMW.2010.187