Examining the Calibration Process for Raters of the "GRE"® General Test. ETS GRE® Board Research Report. GRE®-19-01. Research Report Series. ETS RR-19-09

One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ETS research report series 2019-12
Hauptverfasser:	Wendler, Cathy, Glazer, Nancy, Cline, Frederick
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy College Entrance Examinations Essays Examiners Graduate Study Interrater Reliability Quality Control Scoring Test Reliability Writing Evaluation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as a type of quality control during CR scoring. Calibration sets are designed to provide sufficient evidence that raters have understood and internalized the rubrics and can score accurately across all score points of the score scale. This study examined the calibration process used to qualify raters to score essays from the "GRE"® Analytical Writing measure. A total of 46 experienced raters participated in the study, and each rater scored up to 630 essays from 1 of 2 essay prompt types. Two research questions were evaluated: "Does calibration influence scoring accuracy?" and "Does reducing the frequency of calibration impact scoring accuracy?" While the distribution of score points represented by the essays used in the study did not necessarily reflect what raters see during operational scoring, results suggest that the influence of calibration on Day 1 remains with raters through at least 3 scoring days. Results further suggest that scoring accuracy may be moderated by prompt type. Nevertheless, study results indicate that daily calibration for GRE prompt types may not be necessary and that reducing the frequency of calibration is unlikely to reduce scoring accuracy.
ISSN:	2330-8516