Automated Tools for Subject Matter Expert Evaluation of Automated Scoring

As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied measurement in education 2004-10, Vol.17 (4), p.323-357
Hauptverfasser: Williamson, David M., Bejar, Isaac I., Sax, Anne
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this article explores the potential utility of Classification and Regression Trees (CART) and Kohonen Self-Organizing Maps (SOM) as tools to facilitate subject matter expert (SME) examination of the fine-grained (feature level) quality of automated scores for complex data, with implications for the validity of resultant scores. This article explores both supervised and unsupervised learning techniques, with the former being represented by CART (Breiman, Friedman, Olshen, & Stone, 1984) and the latter by SOM (Kohonen, 1989). Three applications comprise this investigation, the first of which suggests that CART can facilitate efficient and economical identification of specific elements of complex responses that contribute to automated and human score discrepancies. The second application builds on the first by exploring CART for efficiently and accurately automating case selection for human intervention to ensure score validity. The final application explores the potential for SOM to reduce the need for SMEs in evaluating automated scoring. Although both supervised and unsupervised methodologies examined were found to be promising tools for facilitating SME roles in maintaining and improving the quality of automated scoring, such applications remain unproven, and further studies are necessary to establish the reliability of these techniques.
ISSN:0895-7347
1532-4818
DOI:10.1207/s15324818ame1704_1