Sources of Score Scale Inconsistency. Research Report. ETS RR-11-35

Six tasks, selected from assessments administered in 2007 as part of the Cognitively-Based Assessments of, for, and as Learning (CBAL) project, were revised in an effort to remove difficulties with the tasks that were unrelated to the construct being assessed. Because the revised tasks were piloted...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Educational Testing Service 2011
Hauptverfasser:	Fife, James H, Graf, Edith Aurora, Ohls, Sarah
Format:	Report
Sprache:	eng
Schlagworte:	Construct Validity Mathematics Tests Responses Test Construction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Six tasks, selected from assessments administered in 2007 as part of the Cognitively-Based Assessments of, for, and as Learning (CBAL) project, were revised in an effort to remove difficulties with the tasks that were unrelated to the construct being assessed. Because the revised tasks were piloted on a different population from the original tasks, it was not possible to make direct comparisons between the performance of the revised tasks and that of the original tasks, other than to make a qualitative assessment of whether or not the nonconstruct difficulties had, in fact, been removed. But we were able to pilot between 2 and 4 versions of each revised task, and we could compare the performance of our pilot sample on the various versions of each task. For Mix It Up, we prepared 2 nonparallel versions--the first attempted to preserve the construct-related difficulty of the original while removing the nonconstruct-related ambiguities, and the second was intended to be an easier task that measured the same skills and abilities. For Fruit Drink and Paste we created 4 versions, carefully varying different aspects of the language while keeping other aspects constant. For the 2 tasks from Bigfoot, we varied 2 features independently, creating 2 versions of each feature and therefore 4 versions of each task. Finally, for Forest Carbon, we created 4 versions, varying from unscaffolded to carefully scaffolded. Because the revision of each task was its own experiment, the analysis of each task, and our conclusions from that analysis, are described separately. Appended are: (1) Revised Items; and (2) Codes Used in Scoring Tasks and Frequency of Response Data. (Contains 29 figures and 36 tables.)