Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts
Codebooks -- documents that operationalize concepts and outline annotation procedures -- are used almost universally by social scientists when coding political texts. To code these texts automatically, researchers are increasing turning to generative large language models (LLMs). However, there is l...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Codebooks -- documents that operationalize concepts and outline annotation
procedures -- are used almost universally by social scientists when coding
political texts. To code these texts automatically, researchers are increasing
turning to generative large language models (LLMs). However, there is limited
empirical evidence on whether "off-the-shelf" LLMs faithfully follow real-world
codebook operationalizations and measure complex political constructs with
sufficient accuracy. To address this, we gather and curate three real-world
political science codebooks -- covering protest events, political violence and
manifestos -- along with their unstructured texts and human labels. We also
propose a five-stage framework for codebook-LLM measurement: preparing a
codebook for both humans and LLMs, testing LLMs' basic capabilities on a
codebook, evaluating zero-shot measurement accuracy (i.e. off-the-shelf
performance), analyzing errors, and further (parameter-efficient) supervised
training of LLMs. We provide an empirical demonstration of this framework using
our three codebook datasets and several pretrained 7-12 billion open-weight
LLMs. We find current open-weight LLMs have limitations in following codebooks
zero-shot, but that supervised instruction tuning can substantially improve
performance. Rather than suggesting the "best" LLM, our contribution lies in
our codebook datasets, evaluation framework, and guidance for applied
researchers who wish to implement their own codebook-LLM measurement projects. |
---|---|
DOI: | 10.48550/arxiv.2407.10747 |