Investigating variability in tasks and rater judgements in a performance test of foreign language speaking
Much of the recent debate that has surrounded the development and use of 'performance', or 'communicative' language tests has focused on a supposed trade-off between two sets of desirable qualities: correspondence between test tasks and test performance to nontest language use fo...
Gespeichert in:
Veröffentlicht in: | Language testing 1995-07, Vol.12 (2), p.238-257 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Much of the recent debate that has surrounded the development and use of 'performance', or 'communicative' language tests has focused on a supposed trade-off between two sets of desirable qualities: correspondence between test tasks and test performance to nontest language use for content relevance; and reliability of scores derived from test performance. One area that has been of particular concern with performance tests is the potential variability in tasks and rater judgements, and this has been investigated in the language testing literature with two complementary approaches: generalizability the ory and many faceted Rasch modelling. GENOVA, which performs general izability theory analyses, estimates the relative contribution of variation in test tasks and rater judgements to variation in test scores. FACETS, which performs many faceted Rasch modelling, estimates differences in task difficulty and rater severity, and adjusts ability estimates of test takers, taking these differences into account. In this article we first discuss the design and development of a foreign language (Spanish) test battery that was designed for two purposes: first, to place University of California Education Abroad students into programmes at universities abroad that are appropriate for their level of language ability, and secondly to provide diagnostic information that will be useful for designing appropriate teaching and learning pro grammes for prospective education abroad students. The test battery consists of four subtests: reading, listening and note-taking, speaking, and writing. All subtests share a common theme or topic, and are interdependent. We then discuss the results of the GENOVA and FACETS analyses of the speaking subtest, based on a full field trial with a group of University of California undergraduate students who had been selected for participation in the Education Abroad Program. Finally, we discuss the implications of these results for the use of G-theory and many faceted Rasch modelling for the development of performance tests of foreign language ability. |
---|---|
ISSN: | 0265-5322 1477-0946 |
DOI: | 10.1177/026553229501200206 |