Multiclass Classification of Policy Documents with Large Language Models
Classifying policy documents into policy issue topics has been a long-time effort in political science and communication disciplines. Efforts to automate text classification processes for social science research purposes have so far achieved remarkable results, but there is still a large room for pr...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Classifying policy documents into policy issue topics has been a long-time
effort in political science and communication disciplines. Efforts to automate
text classification processes for social science research purposes have so far
achieved remarkable results, but there is still a large room for progress. In
this work, we test the prediction performance of an alternative strategy, which
requires human involvement much less than full manual coding. We use the GPT
3.5 and GPT 4 models of the OpenAI, which are pre-trained instruction-tuned
Large Language Models (LLM), to classify congressional bills and congressional
hearings into Comparative Agendas Project's 21 major policy issue topics. We
propose three use-case scenarios and estimate overall accuracies ranging from
%58-83 depending on scenario and GPT model employed. The three scenarios aims
at minimal, moderate, and major human interference, respectively. Overall, our
results point towards the insufficiency of complete reliance on GPT with
minimal human intervention, an increasing accuracy along with the human effort
exerted, and a surprisingly high accuracy achieved in the most humanly
demanding use-case. However, the superior use-case achieved the %83 accuracy on
the %65 of the data in which the two models agreed, suggesting that a similar
approach to ours can be relatively easily implemented and allow for mostly
automated coding of a majority of a given dataset. This could free up resources
allowing manual human coding of the remaining %35 of the data to achieve an
overall higher level of accuracy while reducing costs significantly. |
---|---|
DOI: | 10.48550/arxiv.2310.08167 |