Topic model based on co-occurrence word networks for unbalanced short text datasets
We propose a straightforward solution for detecting scarce topics in unbalanced short-text datasets. Our approach, named CWUTM (Topic model based on co-occurrence word networks for unbalanced short text datasets), Our approach addresses the challenge of sparse and unbalanced short text topics by mit...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a straightforward solution for detecting scarce topics in
unbalanced short-text datasets. Our approach, named CWUTM (Topic model based on
co-occurrence word networks for unbalanced short text datasets), Our approach
addresses the challenge of sparse and unbalanced short text topics by
mitigating the effects of incidental word co-occurrence. This allows our model
to prioritize the identification of scarce topics (Low-frequency topics).
Unlike previous methods, CWUTM leverages co-occurrence word networks to capture
the topic distribution of each word, and we enhanced the sensitivity in
identifying scarce topics by redefining the calculation of node activity and
normalizing the representation of both scarce and abundant topics to some
extent. Moreover, CWUTM adopts Gibbs sampling, similar to LDA, making it easily
adaptable to various application scenarios. Our extensive experimental
validation on unbalanced short-text datasets demonstrates the superiority of
CWUTM compared to baseline approaches in discovering scarce topics. According
to the experimental results the proposed model is effective in early and
accurate detection of emerging topics or unexpected events on social platforms. |
---|---|
DOI: | 10.48550/arxiv.2311.02566 |