Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments

Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Strugatski, Alona, Alexandron, Giora
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Strugatski, Alona Alexandron, Giora
description	Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics. We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive to the amount of AI cheating in the data. Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments.
doi_str_mv	10.48550/arxiv.2412.02713
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_02713</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_02713</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_027133</originalsourceid><addsrcrecordid>eNqFjrsOgkAURLexMOoHWHl_AOQZbREfUNgQerKRq2yyLBvugvL3ArG3mkzOTHIY27qOHRzD0Nnz9iN62wtcz3a8g-svWRlpLQehXpBmOZgGzoLMWDtBFZzQvBEVJF3NFXBVwg0VttyIHiFKIUPSjSKk6XjvpBFaohVXjXiMnEZANSpDa7Z4ckm4-eWK7a6XPE6s2afQrah5OxSTVzF7-f8XXz1vQ4c</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments</title><source>arXiv.org</source><creator>Strugatski, Alona ; Alexandron, Giora</creator><creatorcontrib>Strugatski, Alona ; Alexandron, Giora</creatorcontrib><description>Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics. We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive to the amount of AI cheating in the data. Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments.</description><identifier>DOI: 10.48550/arxiv.2412.02713</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.02713$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.02713$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Strugatski, Alona</creatorcontrib><creatorcontrib>Alexandron, Giora</creatorcontrib><title>Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments</title><description>Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics. We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive to the amount of AI cheating in the data. Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrsOgkAURLexMOoHWHl_AOQZbREfUNgQerKRq2yyLBvugvL3ArG3mkzOTHIY27qOHRzD0Nnz9iN62wtcz3a8g-svWRlpLQehXpBmOZgGzoLMWDtBFZzQvBEVJF3NFXBVwg0VttyIHiFKIUPSjSKk6XjvpBFaohVXjXiMnEZANSpDa7Z4ckm4-eWK7a6XPE6s2afQrah5OxSTVzF7-f8XXz1vQ4c</recordid><startdate>20241128</startdate><enddate>20241128</enddate><creator>Strugatski, Alona</creator><creator>Alexandron, Giora</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241128</creationdate><title>Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments</title><author>Strugatski, Alona ; Alexandron, Giora</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_027133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Strugatski, Alona</creatorcontrib><creatorcontrib>Alexandron, Giora</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Strugatski, Alona</au><au>Alexandron, Giora</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments</atitle><date>2024-11-28</date><risdate>2024</risdate><abstract>Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics. We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive to the amount of AI cheating in the data. Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments.</abstract><doi>10.48550/arxiv.2412.02713</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.02713
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_02713
source	arXiv.org
subjects	Computer Science - Artificial Intelligence
title	Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T16%3A19%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Applying%20IRT%20to%20Distinguish%20Between%20Human%20and%20Generative%20AI%20Responses%20to%20Multiple-Choice%20Assessments&rft.au=Strugatski,%20Alona&rft.date=2024-11-28&rft_id=info:doi/10.48550/arxiv.2412.02713&rft_dat=%3Carxiv_GOX%3E2412_02713%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true