Crystallizing Schemas with Teleoscope: Thematic Curation of Large Text Corpora
Making sense of large text corpora is difficult when scales reach thousands or millions of documents. With the advent of LLMs, the potential for large-scale sense-making is being realized. However, this presents a need for rigour in the data curation stage of thematic analysis: selecting the right d...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Making sense of large text corpora is difficult when scales reach thousands
or millions of documents. With the advent of LLMs, the potential for
large-scale sense-making is being realized. However, this presents a need for
rigour in the data curation stage of thematic analysis: selecting the right
documents to achieve appropriate information power (saturation) requires an
auditable trace of researchers' thought processes.
In this paper, we present methodological and design findings from a
three-year design process where we worked with qualitative researchers to
create an open-source platform called Teleoscope designed to rigorously curate
documents at scale. By implementing the qualitative research values common to
thematic analysis during the curation stage (which we call thematic curation),
we found researchers could come to a shared understanding of a large corpus and
feel confident in their curation decisions (which we call schema
crystallization). |
---|---|
DOI: | 10.48550/arxiv.2402.06124 |