MPI/OpenMP Hybrid Parallel Inference Methods for Latent Dirichlet Allocation — Approximation and Evaluation

Recently, probabilistic topic models have been applied to various types of data, including text, and their effectiveness has been demonstrated. Latent Dirichlet allocation (LDA) is a well known topic model. Variational Bayesian inference or collapsed Gibbs sampling is often used to estimate paramete...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEICE Transactions on Information and Systems 2013/05/01, Vol.E96.D(5), pp.1006-1015
Hauptverfasser:	TORA, Shotaro, EGUCHI, Koji
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Clusters Computation Dirichlet problem Electrical engineering. Electrical power engineering Electronic equipment and fabrication. Passive components, printed wiring boards, connectics Electronics Exact sciences and technology Gibbs sampling Inference Latent Dirichlet allocation Mathematical models Message passing MPI/OpenMP hybrid parallelization Parallel processing Power electronics, power supplies probabilistic topic models Sampling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, probabilistic topic models have been applied to various types of data, including text, and their effectiveness has been demonstrated. Latent Dirichlet allocation (LDA) is a well known topic model. Variational Bayesian inference or collapsed Gibbs sampling is often used to estimate parameters in LDA; however, these inference methods incur high computational cost for large-scale data. Therefore, highly efficient technology is needed for this purpose. We use parallel computation technology for efficient collapsed Gibbs sampling inference for LDA. We assume a symmetric multiprocessing (SMP) cluster, which has been widely used in recent years. In prior work on parallel inference for LDA, either MPI or OpenMP has often been used alone. For an SMP cluster, however, it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. We developed an MPI/OpenMP hybrid parallel inference method for LDA, and evaluated the performance of the inference under various settings of an SMP cluster. We further investigated the approximation that controls the inter-node communications, and found out that it achieved noticeable increase in inference speed while maintaining inference accuracy.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.E96.D.1006