Proposal for modifying procedures for declaring significant trends in TIMSS

International large-scale assessments (ILSA) are an important source of information for education policymakers across the globe. Despite sponsors’ warnings, when results are published, media attention focuses on country rankings and changes in scores. Score changes are evaluated using a two-sided z-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Large-scale assessments in education 2025-12, Vol.13 (1), p.2
Hauptverfasser: Braun, Henry Isaiah, von Davier, Matthias, Chen, Jihang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:International large-scale assessments (ILSA) are an important source of information for education policymakers across the globe. Despite sponsors’ warnings, when results are published, media attention focuses on country rankings and changes in scores. Score changes are evaluated using a two-sided z-statistic, with statistical significance declared if the statistic exceeds 1.96 in absolute value. Findings of significance occasion much commentary and may also have policy implications. Particularly problematic, however, are cases in which a significant trend in one direction is followed in the next cycle by a significant trend in the opposite direction. Such reporting reversals are often difficult to explain on substantive grounds and, consequently, can undermine the credibility of the ILSA. This article proposes and evaluates a new approach to determining the significance of observed score changes. It employs a two one-sided test (TOST) procedure. A key feature is establishing an equivalence zone around zero. A change is declared only if the test statistic does not fall in the equivalence zone. Thus, the procedure combines consideration of statistical significance and substantive importance. We augment the TOST procedure with the Benjamini–Hochberg procedure (BH-TOST) to control the False Discovery Rate , to address the problem of multiplicity. Using data from TIMSS (2011, 2015, 2019), we explore different parameter choices for BH-TOST and evaluate the operating characteristics of the selected version with respect to three different criteria. Our results indicate that BH-TOST is generally superior to the current procedure, concluding that it merits serious consideration as a basis for reporting TIMSS results.
ISSN:2196-0739
DOI:10.1186/s40536-025-00236-z