Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences

Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Social science computer review 2024-02, Vol.42 (1), p.224-248
Hauptverfasser: Watanabe, Kohei, Baturo, Alexander
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 248
container_issue 1
container_start_page 224
container_title Social science computer review
container_volume 42
creator Watanabe, Kohei
Baturo, Alexander
description Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification more theoretically grounded and content analysis in general more topic-specific, we have developed Seeded Sequential Latent Dirichlet allocation (LDA), extending the existing LDA algorithm, and implementing it in a widely accessible open-source package. Taking a large corpus of speeches delivered by delegates at the United Nations General Assembly as an example, we explain how our algorithm differs from the original algorithm; why it can classify sentences more accurately; how it accepts pre-defined topics in deductive or semi-deductive analysis; how such ex-ante topic mapping differs from ex-post topic mapping; how it enables topic-specific framing analysis in applied research. We also offer practical guidance on how to determine the optimal number of topics and select seed words for the algorithm.
doi_str_mv 10.1177/08944393231178605
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918983959</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_08944393231178605</sage_id><sourcerecordid>2918983959</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-7fb51533dfe7faf48a7e8fd8b74062b3e245cef65821aa76dd0bfb8dde6d4bc53</originalsourceid><addsrcrecordid>eNp1kE1LxDAQhoMouK7-AG8Fz13z0TSpt7J-woKHricPJU0ma5ZuW5NW2H9vlhU8iKdhZp73ZeZF6JrgBSFC3GJZZBkrGGWxlTnmJ2hGOKeppDI_RbPDPj0A5-gihC3GhAqMZ-i9AjBgkgo-J-hGp9pkdV_eJWWc7FxaTQP4LxciUbab3rvxY5fY3ifrfnA6rQbQzjqdlJ1q98GFpLdR2I3QaQiX6MyqNsDVT52jt8eH9fI5Xb0-vSzLVaoZ52MqbMMJZ8xYEFbZTCoB0hrZiAzntGFAM67B5lxSopTIjcGNbaQxkJus0ZzN0c3Rd_B9_CKM9baffLwo1LQgspCs4EWkyJHSvg_Bg60H73bK72uC60OG9Z8Mo2Zx1AS1gV_X_wXfB-ZxeQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918983959</pqid></control><display><type>article</type><title>Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences</title><source>SAGE Complete A-Z List</source><source>Alma/SFX Local Collection</source><source>Sociological Abstracts</source><creator>Watanabe, Kohei ; Baturo, Alexander</creator><creatorcontrib>Watanabe, Kohei ; Baturo, Alexander</creatorcontrib><description>Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification more theoretically grounded and content analysis in general more topic-specific, we have developed Seeded Sequential Latent Dirichlet allocation (LDA), extending the existing LDA algorithm, and implementing it in a widely accessible open-source package. Taking a large corpus of speeches delivered by delegates at the United Nations General Assembly as an example, we explain how our algorithm differs from the original algorithm; why it can classify sentences more accurately; how it accepts pre-defined topics in deductive or semi-deductive analysis; how such ex-ante topic mapping differs from ex-post topic mapping; how it enables topic-specific framing analysis in applied research. We also offer practical guidance on how to determine the optimal number of topics and select seed words for the algorithm.</description><identifier>ISSN: 0894-4393</identifier><identifier>EISSN: 1552-8286</identifier><identifier>DOI: 10.1177/08944393231178605</identifier><language>eng</language><publisher>Los Angeles, CA: SAGE Publications</publisher><subject>Algorithms ; Classification ; Content analysis ; Dirichlet problem ; Frame analysis ; Mapping ; Sentences</subject><ispartof>Social science computer review, 2024-02, Vol.42 (1), p.224-248</ispartof><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-7fb51533dfe7faf48a7e8fd8b74062b3e245cef65821aa76dd0bfb8dde6d4bc53</citedby><cites>FETCH-LOGICAL-c355t-7fb51533dfe7faf48a7e8fd8b74062b3e245cef65821aa76dd0bfb8dde6d4bc53</cites><orcidid>0000-0001-6519-5265 ; 0000-0002-1108-5287</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/08944393231178605$$EPDF$$P50$$Gsage$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/08944393231178605$$EHTML$$P50$$Gsage$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,21798,27901,27902,33751,43597,43598</link.rule.ids></links><search><creatorcontrib>Watanabe, Kohei</creatorcontrib><creatorcontrib>Baturo, Alexander</creatorcontrib><title>Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences</title><title>Social science computer review</title><description>Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification more theoretically grounded and content analysis in general more topic-specific, we have developed Seeded Sequential Latent Dirichlet allocation (LDA), extending the existing LDA algorithm, and implementing it in a widely accessible open-source package. Taking a large corpus of speeches delivered by delegates at the United Nations General Assembly as an example, we explain how our algorithm differs from the original algorithm; why it can classify sentences more accurately; how it accepts pre-defined topics in deductive or semi-deductive analysis; how such ex-ante topic mapping differs from ex-post topic mapping; how it enables topic-specific framing analysis in applied research. We also offer practical guidance on how to determine the optimal number of topics and select seed words for the algorithm.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Content analysis</subject><subject>Dirichlet problem</subject><subject>Frame analysis</subject><subject>Mapping</subject><subject>Sentences</subject><issn>0894-4393</issn><issn>1552-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>AFRWT</sourceid><sourceid>BHHNA</sourceid><recordid>eNp1kE1LxDAQhoMouK7-AG8Fz13z0TSpt7J-woKHricPJU0ma5ZuW5NW2H9vlhU8iKdhZp73ZeZF6JrgBSFC3GJZZBkrGGWxlTnmJ2hGOKeppDI_RbPDPj0A5-gihC3GhAqMZ-i9AjBgkgo-J-hGp9pkdV_eJWWc7FxaTQP4LxciUbab3rvxY5fY3ifrfnA6rQbQzjqdlJ1q98GFpLdR2I3QaQiX6MyqNsDVT52jt8eH9fI5Xb0-vSzLVaoZ52MqbMMJZ8xYEFbZTCoB0hrZiAzntGFAM67B5lxSopTIjcGNbaQxkJus0ZzN0c3Rd_B9_CKM9baffLwo1LQgspCs4EWkyJHSvg_Bg60H73bK72uC60OG9Z8Mo2Zx1AS1gV_X_wXfB-ZxeQ</recordid><startdate>202402</startdate><enddate>202402</enddate><creator>Watanabe, Kohei</creator><creator>Baturo, Alexander</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AFRWT</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U4</scope><scope>8FD</scope><scope>BHHNA</scope><scope>DWI</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>WZK</scope><orcidid>https://orcid.org/0000-0001-6519-5265</orcidid><orcidid>https://orcid.org/0000-0002-1108-5287</orcidid></search><sort><creationdate>202402</creationdate><title>Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences</title><author>Watanabe, Kohei ; Baturo, Alexander</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-7fb51533dfe7faf48a7e8fd8b74062b3e245cef65821aa76dd0bfb8dde6d4bc53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Content analysis</topic><topic>Dirichlet problem</topic><topic>Frame analysis</topic><topic>Mapping</topic><topic>Sentences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Watanabe, Kohei</creatorcontrib><creatorcontrib>Baturo, Alexander</creatorcontrib><collection>Sage Journals GOLD Open Access 2024</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Sociological Abstracts (pre-2017)</collection><collection>Technology Research Database</collection><collection>Sociological Abstracts</collection><collection>Sociological Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Sociological Abstracts (Ovid)</collection><jtitle>Social science computer review</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Watanabe, Kohei</au><au>Baturo, Alexander</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences</atitle><jtitle>Social science computer review</jtitle><date>2024-02</date><risdate>2024</risdate><volume>42</volume><issue>1</issue><spage>224</spage><epage>248</epage><pages>224-248</pages><issn>0894-4393</issn><eissn>1552-8286</eissn><abstract>Topic models have been widely used by researchers across disciplines to automatically analyze large textual data. However, they often fail to automate content analysis, because the algorithms cannot accurately classify individual sentences into pre-defined topics. Aiming to make topic classification more theoretically grounded and content analysis in general more topic-specific, we have developed Seeded Sequential Latent Dirichlet allocation (LDA), extending the existing LDA algorithm, and implementing it in a widely accessible open-source package. Taking a large corpus of speeches delivered by delegates at the United Nations General Assembly as an example, we explain how our algorithm differs from the original algorithm; why it can classify sentences more accurately; how it accepts pre-defined topics in deductive or semi-deductive analysis; how such ex-ante topic mapping differs from ex-post topic mapping; how it enables topic-specific framing analysis in applied research. We also offer practical guidance on how to determine the optimal number of topics and select seed words for the algorithm.</abstract><cop>Los Angeles, CA</cop><pub>SAGE Publications</pub><doi>10.1177/08944393231178605</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0001-6519-5265</orcidid><orcidid>https://orcid.org/0000-0002-1108-5287</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0894-4393
ispartof Social science computer review, 2024-02, Vol.42 (1), p.224-248
issn 0894-4393
1552-8286
language eng
recordid cdi_proquest_journals_2918983959
source SAGE Complete A-Z List; Alma/SFX Local Collection; Sociological Abstracts
subjects Algorithms
Classification
Content analysis
Dirichlet problem
Frame analysis
Mapping
Sentences
title Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T12%3A24%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Seeded%20Sequential%20LDA:%20A%20Semi-Supervised%20Algorithm%20for%20Topic-Specific%20Analysis%20of%20Sentences&rft.jtitle=Social%20science%20computer%20review&rft.au=Watanabe,%20Kohei&rft.date=2024-02&rft.volume=42&rft.issue=1&rft.spage=224&rft.epage=248&rft.pages=224-248&rft.issn=0894-4393&rft.eissn=1552-8286&rft_id=info:doi/10.1177/08944393231178605&rft_dat=%3Cproquest_cross%3E2918983959%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918983959&rft_id=info:pmid/&rft_sage_id=10.1177_08944393231178605&rfr_iscdi=true