Permutation tests are robust and powerful at 0.5% and 5% significance levels

Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend usi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Behavior Research Methods 2021-12, Vol.53 (6), p.2712-2724
Hauptverfasser: Noguchi, Kimihiro, Konietschke, Frank, Marmolejo-Ramos, Fernando, Pauly, Markus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.
ISSN:1554-3528
1554-3528
DOI:10.3758/s13428-021-01595-5