Permutation tests are robust and powerful at 0.5% and 5% significance levels

Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend usi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Behavior Research Methods 2021-12, Vol.53 (6), p.2712-2724
Hauptverfasser:	Noguchi, Kimihiro, Konietschke, Frank, Marmolejo-Ramos, Fernando, Pauly, Markus
Format:	Artikel
Sprache:	eng
Schlagworte:	Behavioral Science and Psychology Cognitive Psychology Computer Simulation False Positive Reactions Human acts Human behavior Humans Models, Statistical Nonparametric statistics Probability Psychology Reproducibility Statistical analysis Statistical Distributions Statistical significance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.
ISSN:	1554-3528 1554-3528
DOI:	10.3758/s13428-021-01595-5