Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

•Large language models show great promise in medical research, but health privacy challenges limit usage.•We implement a locally-hosted open source LLM which allows us to analyze protected clinical data.•Local LLM generate codes and themes as key phases of thematic analysis of clinical qualitative i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer methods and programs in biomedicine 2024-10, Vol.255, p.108356, Article 108356
Hauptverfasser:	Mathis, Walter S, Zhao, Sophia, Pratt, Nicholas, Weleff, Jeremy, De Paoli, Stefano
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Delivery of Health Care Humans Interviews as Topic Large language models Mental health Natural Language Processing Qualitative methods Qualitative Research Thematic analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Large language models show great promise in medical research, but health privacy challenges limit usage.•We implement a locally-hosted open source LLM which allows us to analyze protected clinical data.•Local LLM generate codes and themes as key phases of thematic analysis of clinical qualitative interviews.•Three evaluation strategies for comparing LLM output to human output show robust similarities. Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information. Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting. Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans. These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44–0.69), which are promising preliminary results. Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.
ISSN:	0169-2607 1872-7565 1872-7565
DOI:	10.1016/j.cmpb.2024.108356