The exact asymptotic form of Bayesian generalization error in latent Dirichlet allocation

Latent Dirichlet allocation (LDA) obtains essential information from data by using Bayesian inference. It is applied to knowledge discovery via dimension reducing and clustering in many fields. However, its generalization error had not been yet clarified since it is a singular statistical model wher...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2021-05, Vol.137, p.127-137
1. Verfasser: Hayashi, Naoki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Latent Dirichlet allocation (LDA) obtains essential information from data by using Bayesian inference. It is applied to knowledge discovery via dimension reducing and clustering in many fields. However, its generalization error had not been yet clarified since it is a singular statistical model where there is no one-to-one mapping from parameters to probability distributions. In this paper, we give the exact asymptotic form of its generalization error and marginal likelihood, by theoretical analysis of its learning coefficient using algebraic geometry. The theoretical result shows that the Bayesian generalization error in LDA is expressed in terms of that in matrix factorization and a penalty from the simplex restriction of LDA’s parameter region. A numerical experiment is consistent with the theoretical result. •Latent Dirichlet Allocation (LDA) is a widely used statistical model in many fields.•The Bayesian generalization error (GE) in LDA has been unknown since it is singular.•We determine the learning coefficient of GE in LDA with resolution of singularity.•The main result shows the behavior of GE in LDA when topics or sample size increases.•The experimental result is consistent with Main Theorem when sample size is finite.
ISSN:0893-6080
1879-2782
DOI:10.1016/j.neunet.2021.01.024