The impact of task duration on the scoring of independent writing responses of adult L2-English writers

In writing assessment, there is inherently a tension between authenticity and practicality: tasks with longer durations may more closely reflect real-life writing processes but are less feasible to administer and score. What is more, given total testing time, there is necessarily a trade-off between...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Assessing writing 2024-10, Vol.62, p.100895, Article 100895
Hauptverfasser:	Naismith, Ben, Attali, Yigal, LaFlair, Geoffrey T.
Format:	Artikel
Sprache:	eng
Schlagworte:	Criterion validity Task duration Test-retest reliability Writing assessment
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In writing assessment, there is inherently a tension between authenticity and practicality: tasks with longer durations may more closely reflect real-life writing processes but are less feasible to administer and score. What is more, given total testing time, there is necessarily a trade-off between task duration and number of tasks. Traditionally, high-stakes assessments have managed this trade-off by administering one or two writing tasks each test, allowing 20–40 minutes per task. However, research on second language (L2) English writing has not found longer task durations to significantly improve score validity or reliability. Importantly, very few studies have compared much shorter durations for writing tasks to more traditional allotments. To explore this issue, we asked adult L2-English test takers to respond to two writing prompts with either 5-minute or 20-minute time limits. Responses were then evaluated by expert human raters and an automated writing evaluation tool. Regardless of scoring method, short duration scores evidenced equally high test-retest reliability and criterion validity as long duration scores. As expected, longer task duration yielded higher scores, but regardless of duration, test takers demonstrated the entire spectrum of writing proficiency. Implications for writing assessment are discussed in relation to scoring practices and task design. •Longer writing tasks do not have higher test-retest reliability than shorter ones.•Longer writing tasks do not have higher criterion validity than shorter ones.•The impact of task duration is not mediated by scoring method (human or machine).
ISSN:	1075-2935
DOI:	10.1016/j.asw.2024.100895