LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-06
Hauptverfasser:	Chew, Robert, Bollenbacher, John, Wenger, Michael, Speer, Jessica, Kim, Annice
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Coding Content analysis Datasets Documents Empirical analysis Large language models Natural language processing Qualitative research Research methodology Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Chew, Robert Bollenbacher, John Wenger, Michael Speer, Jessica Kim, Annice
description	Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2830494418</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2830494418</sourcerecordid><originalsourceid>FETCH-proquest_journals_28304944183</originalsourceid><addsrcrecordid>eNqNTUsKwjAUDIJg0d4h4LrQJqlWd6UqLtqVn20J5llaSlLzEsHbm4UHcDMzDPOZkYhxniWFYGxBYsQhTVO22bI85xG513WTlIg9OlC0MtqBdrTUcvwEb09v2OuO1tJ2EFB3XgbRGAUjUmfoxU-TsY4eQPmH698QJlRorMj8KUeE-MdLsj4dr9U5max5eUDXDsbb8IItK3gqdkJkBf8v9QWPr0Bm</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2830494418</pqid></control><display><type>article</type><title>LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding</title><source>Free E- Journals</source><creator>Chew, Robert ; Bollenbacher, John ; Wenger, Michael ; Speer, Jessica ; Kim, Annice</creator><creatorcontrib>Chew, Robert ; Bollenbacher, John ; Wenger, Michael ; Speer, Jessica ; Kim, Annice</creatorcontrib><description>Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Coding ; Content analysis ; Datasets ; Documents ; Empirical analysis ; Large language models ; Natural language processing ; Qualitative research ; Research methodology ; Unstructured data</subject><ispartof>arXiv.org, 2023-06</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Chew, Robert</creatorcontrib><creatorcontrib>Bollenbacher, John</creatorcontrib><creatorcontrib>Wenger, Michael</creatorcontrib><creatorcontrib>Speer, Jessica</creatorcontrib><creatorcontrib>Kim, Annice</creatorcontrib><title>LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding</title><title>arXiv.org</title><description>Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.</description><subject>Coders</subject><subject>Coding</subject><subject>Content analysis</subject><subject>Datasets</subject><subject>Documents</subject><subject>Empirical analysis</subject><subject>Large language models</subject><subject>Natural language processing</subject><subject>Qualitative research</subject><subject>Research methodology</subject><subject>Unstructured data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNTUsKwjAUDIJg0d4h4LrQJqlWd6UqLtqVn20J5llaSlLzEsHbm4UHcDMzDPOZkYhxniWFYGxBYsQhTVO22bI85xG513WTlIg9OlC0MtqBdrTUcvwEb09v2OuO1tJ2EFB3XgbRGAUjUmfoxU-TsY4eQPmH698QJlRorMj8KUeE-MdLsj4dr9U5max5eUDXDsbb8IItK3gqdkJkBf8v9QWPr0Bm</recordid><startdate>20230623</startdate><enddate>20230623</enddate><creator>Chew, Robert</creator><creator>Bollenbacher, John</creator><creator>Wenger, Michael</creator><creator>Speer, Jessica</creator><creator>Kim, Annice</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230623</creationdate><title>LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding</title><author>Chew, Robert ; Bollenbacher, John ; Wenger, Michael ; Speer, Jessica ; Kim, Annice</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28304944183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Coding</topic><topic>Content analysis</topic><topic>Datasets</topic><topic>Documents</topic><topic>Empirical analysis</topic><topic>Large language models</topic><topic>Natural language processing</topic><topic>Qualitative research</topic><topic>Research methodology</topic><topic>Unstructured data</topic><toplevel>online_resources</toplevel><creatorcontrib>Chew, Robert</creatorcontrib><creatorcontrib>Bollenbacher, John</creatorcontrib><creatorcontrib>Wenger, Michael</creatorcontrib><creatorcontrib>Speer, Jessica</creatorcontrib><creatorcontrib>Kim, Annice</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chew, Robert</au><au>Bollenbacher, John</au><au>Wenger, Michael</au><au>Speer, Jessica</au><au>Kim, Annice</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding</atitle><jtitle>arXiv.org</jtitle><date>2023-06-23</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2830494418
source	Free E- Journals
subjects	Coders Coding Content analysis Datasets Documents Empirical analysis Large language models Natural language processing Qualitative research Research methodology Unstructured data
title	LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T18%3A46%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=LLM-Assisted%20Content%20Analysis:%20Using%20Large%20Language%20Models%20to%20Support%20Deductive%20Coding&rft.jtitle=arXiv.org&rft.au=Chew,%20Robert&rft.date=2023-06-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2830494418%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2830494418&rft_id=info:pmid/&rfr_iscdi=true