Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences
Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification...
Gespeichert in:
Veröffentlicht in: | Social science computer review 2024-02, Vol.42 (1), p.224-248 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification more theoretically grounded and content analysis in general more topic-specific, we have developed Seeded Sequential Latent Dirichlet allocation (LDA), extending the existing LDA algorithm, and implementing it in a widely accessible open-source package. Taking a large corpus of speeches delivered by delegates at the United Nations General Assembly as an example, we explain how our algorithm differs from the original algorithm; why it can classify sentences more accurately; how it accepts pre-defined topics in deductive or semi-deductive analysis; how such ex-ante topic mapping differs from ex-post topic mapping; how it enables topic-specific framing analysis in applied research. We also offer practical guidance on how to determine the optimal number of topics and select seed words for the algorithm. |
---|---|
ISSN: | 0894-4393 1552-8286 |
DOI: | 10.1177/08944393231178605 |