Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments
Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generative AI is transforming the educational landscape, raising significant
concerns about cheating. Despite the widespread use of multiple-choice
questions in assessments, the detection of AI cheating in MCQ-based tests has
been almost unexplored, in contrast to the focus on detecting AI-cheating on
text-rich student outputs. In this paper, we propose a method based on the
application of Item Response Theory to address this gap. Our approach operates
on the assumption that artificial and human intelligence exhibit different
response patterns, with AI cheating manifesting as deviations from the expected
patterns of human responses. These deviations are modeled using Person-Fit
Statistics. We demonstrate that this method effectively highlights the
differences between human responses and those generated by premium versions of
leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive
to the amount of AI cheating in the data. Furthermore, we show that the
chatbots differ in their reasoning profiles. Our work provides both a
theoretical foundation and empirical evidence for the application of IRT to
identify AI cheating in MCQ-based assessments. |
---|---|
DOI: | 10.48550/arxiv.2412.02713 |