Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Topic model evaluation, like evaluation of other unsupervised methods, can be
contentious. However, the field has coalesced around automated estimates of
topic coherence, which rely on the frequency of word co-occurrences in a
reference corpus. Contemporary neural topic models surpass classical ones
according to these metrics. At the same time, topic model evaluation suffers
from a validation gap: automated coherence, developed for classical models, has
not been validated using human experimentation for neural models. In addition,
a meta-analysis of topic modeling literature reveals a substantial
standardization gap in automated topic modeling benchmarks. To address the
validation gap, we compare automated coherence with the two most widely
accepted human judgment tasks: topic rating and word intrusion. To address the
standardization gap, we systematically evaluate a dominant classical model and
two state-of-the-art neural models on two commonly used datasets. Automated
evaluations declare a winning model when corresponding human evaluations do
not, calling into question the validity of fully automatic evaluations
independent of human judgments. |
---|---|
DOI: | 10.48550/arxiv.2107.02173 |