Rating Scales Derived From Student Samples: Effects of the Scale Maker and the Student Sample on Scale Content and Student Scores

Performance tests typically require raters to judge the quality of examinees' written or spoken language relative to a rating scale; therefore, scores may be affected by variables inherent in the specific scale development process. In this study we consider two variables in empirically derived...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	TESOL quarterly 2002, Vol.36 (1), p.49-70
Hauptverfasser:	TURNER, CAROLYN E., UPSHUR, JOHN A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Art songs Cluster analysis English (Second Language) Language Performance tests Principal components analysis Rating Scales Saliency Scores Second Language Instruction Second Language Learning Student Evaluation Writing assignments Writing Evaluation Writing instruction Written composition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Performance tests typically require raters to judge the quality of examinees' written or spoken language relative to a rating scale; therefore, scores may be affected by variables inherent in the specific scale development process. In this study we consider two variables in empirically derived rating scales that have not been investigated to date: scale developers and the sample of performances used by the scale developers. These variables may affect scale content and structure and (ultimately) final test scores. This study examined the development and use of scales using two samples of ESL student writing and three teams of rating scale developers to construct three empirically derived scales. A comparison of the scale content showed considerable variation even though all development teams used similar constructs of writing ability. Each team used its own scale to rate a different set of compositions. Comparison of the ratings showed that scale development team had a minor effect on ratings and that scale development sample had a major effect. We present implications of these findings on the nature of empirically derived rating scales, focusing particularly on how such scales are developed.
ISSN:	0039-8322 1545-7249
DOI:	10.2307/3588360