The use of test method characteristics in the content analysis and design of EFL proficiency tests

Content considerations are widely viewed to be essential in the design of language tests, and evidence of content relevance and coverage provides an important component in the validation of score interpretations. Content analysis can be viewed as the application of a model of test design to a partic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Language testing 1996-07, Vol.13 (2), p.125-150
Hauptverfasser: Bachman, Lyle F., Davidson, Fred, Milanovic, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Content considerations are widely viewed to be essential in the design of language tests, and evidence of content relevance and coverage provides an important component in the validation of score interpretations. Content analysis can be viewed as the application of a model of test design to a particular measurement instrument, using judgements of trained analysts. Following Bachman (1990), a content analysis of test method characteristics and components of communicative language ability was performed by five raters on six forms of an EFL test from the University of Cambridge Local Examinations Syndicate. To investigate rater agreement, generalizability analysis and a new agreement statistic (the rater agree ment proportion or 'RAP') were used. Results indicate that the overall level of rater agreement was very high, and that raters were more consistent in rating method than ability. To examine interform comparability, method/ability content analysis characteristics (called 'facets') which differed by more than one standard deviation of either form were deemed to be salient. Results indicated that not all facets yielded substantive information about interform content comparability, although certain test characteristics could be targeted for further revision and development. The relationships between content analysis ratings and two-para meter IRT item parameter estimates (difficulty and discrimination) were also investigated. Neither test method nor ability ratings by themselves yielded consist ent predictions of either item discrimination or difficulty across the six forms examined. Fairly high predictions were consistently obtained, however, when method and ability ratings were combined. The implications of these findings, as well as the utility of content analysis in operational test development, are dis cussed.
ISSN:0265-5322
1477-0946
DOI:10.1177/026553229601300201