Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases

In automated essay scoring (AES), scores are automatically assigned to essays as an alternative to grading by humans. Traditional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks to obviate the need for feature engineering. T...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on learning technologies 2021-12, Vol.14 (6), p.763-776
Hauptverfasser: Uto, Masaki, Okano, Masashi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In automated essay scoring (AES), scores are automatically assigned to essays as an alternative to grading by humans. Traditional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks to obviate the need for feature engineering. Those AES models generally require training on a large dataset of graded essays. However, assigned grades in such a training dataset are known to be biased owing to effects of rater characteristics when grading is conducted by assigning a few raters in a rater set to each essay. Performance of AES models drops when such biased data are used for model training. Researchers in the fields of educational and psychological measurement have recently proposed item response theory (IRT) models that can estimate essay scores while considering effects of rater biases. This study, therefore, proposes a new method that trains AES models using IRT-based scores for dealing with rater bias within training data.
ISSN:1939-1382
1939-1382
2372-0050
DOI:10.1109/TLT.2022.3145352