Reducing Workload in Short Answer Grading Using Machine Learning

Machine learning methods can be used to reduce the manual workload in exam grading, making it possible for teachers to spend more time on other tasks. However, when it comes to grading exams, fully eliminating manual work is not yet possible even with very accurate automated grading, as any grading...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of artificial intelligence in education 2024-06, Vol.34 (2), p.247-273
Hauptverfasser: Weegar, Rebecka, Idestam-Almquist, Peter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine learning methods can be used to reduce the manual workload in exam grading, making it possible for teachers to spend more time on other tasks. However, when it comes to grading exams, fully eliminating manual work is not yet possible even with very accurate automated grading, as any grading mistakes could have significant consequences for the students. Here, the evaluation of an automated grading approach is therefore extended from measuring workload in relation to the accuracy of automated grading, to also measuring the overall workload required to correctly grade a full exam, with and without the support of machine learning. The evaluation was performed during an introductory computer science course with over 400 students. The exam consisted of 64 questions with relatively short answers and a two-step approach for automated grading was applied. First, a subset of answers to the exam questions was manually graded and next used as training data for machine learning models classifying the remaining answers. A number of different strategies for how to select which answers to include in the training data were evaluated. The time spent on different grading actions was measured along with the reduction of effort using clustering of answers and automated scoring. Compared to fully manual grading, the overall reduction of workload was substantial—between 64% and 74%—even with a complete manual review of all classifier output to ensure a fair grading.
ISSN:1560-4292
1560-4306
1560-4306
DOI:10.1007/s40593-022-00322-1