Detecting ChatGPT-generated essays in a large-scale writing assessment: Is there a bias against non-native English speakers?
With the prevalence of generative AI tools like ChatGPT, automated detectors of AI-generated texts have been increasingly used in education to detect the misuse of these tools (e.g., cheating in assessments). Recently, the responsible use of these detectors has attracted a lot of attention. Research...
Gespeichert in:
Veröffentlicht in: | Computers and education 2024-08, Vol.217, p.105070, Article 105070 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the prevalence of generative AI tools like ChatGPT, automated detectors of AI-generated texts have been increasingly used in education to detect the misuse of these tools (e.g., cheating in assessments). Recently, the responsible use of these detectors has attracted a lot of attention. Research has shown that publicly available detectors are more likely to misclassify essays written by non-native English speakers as AI-generated than those written by native English speakers. In this study, we address these concerns by leveraging carefully sampled large-scale data from the Graduate Record Examinations (GRE) writing assessment. We developed multiple detectors of ChatGPT-generated essays based on linguistic features from the ETS e-rater engine and text perplexity features, and investigated their performance and potential bias. Results showed that our carefully constructed detectors not only achieved near-perfect detection accuracy, but also showed no evidence of bias disadvantaging non-native English speakers. Findings of this study contribute to the ongoing debates surrounding the formulation of policies for utilizing AI-generated content detectors in education.
•We study the potential bias in detecting ChatGPT-generated essays in a large-scale assessment.•Detectors based on linguistic features showed near-perfect detection performance.•Detectors built using well-sampled data from GRE do not show bias against non-native English speakers.•Findings shed light on the fairness in applying automated LLM detectors in education. |
---|---|
ISSN: | 0360-1315 1873-782X |
DOI: | 10.1016/j.compedu.2024.105070 |