Permutation tests are robust and powerful at 0.5% and 5% significance levels
Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend usi...
Gespeichert in:
Veröffentlicht in: | Behavior Research Methods 2021-12, Vol.53 (6), p.2712-2724 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2724 |
---|---|
container_issue | 6 |
container_start_page | 2712 |
container_title | Behavior Research Methods |
container_volume | 53 |
creator | Noguchi, Kimihiro Konietschke, Frank Marmolejo-Ramos, Fernando Pauly, Markus |
description | Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (
Proceedings of the National Academy of Sciences
,
110
, 19313–19317,
2013
) and Benjamin et al. (
Nature Human Behaviour
,
2
, 6–10
2018
) recommend using the significance level of
α
= 0.005 (0.5
%
) as opposed to the conventional 0.05 (5
%
) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of
α
= 0.005 and
α
= 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch
t
-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize
t
-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity. |
doi_str_mv | 10.3758/s13428-021-01595-5 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_2534610620</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A713917804</galeid><sourcerecordid>A713917804</sourcerecordid><originalsourceid>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</originalsourceid><addsrcrecordid>eNp9kU9r3DAQxUVpSdJNv0APxVACuXgraSRbPoaQ_oGF9tCchSyPFgVb2kh2Qr59lThNSw9FhxEzvzc85hHyntEttFJ9ygwEVzXlrKZMdrKWr8gJk1LUILl6_df_mLzN-YZSUJyJI3IMgkoqoDkhux-YpmU2s4-hmjHPuTIJqxT7Jc-VCUN1iPeY3DJWZq7oVp49NUvJfh-889YEi9WIdzjmU_LGmTHju-e6Idefr35efq133798u7zY1VaoZq4HK4ahF0404HpnKfBOgUSurJODpS3KjplW8Q6tgIFx6KTtlAPlGm461cOGnK97DyneLsW0nny2OI4mYFyy5hJEw2jDaUE__oPexCWF4k7zhhYAePtIbVdqb0bUPrg4J2PLG3DyNgZ0vvQvWgYda1W53IbwVWBTzDmh04fkJ5MeNKP6MRy9hqNLOPopHC2L6MOzl6WfcHiR_E6jALACuYzCHtMfs_9Z-wtiIJeE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2602033270</pqid></control><display><type>article</type><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><source>MEDLINE</source><source>Springer Online Journals Complete</source><creator>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</creator><creatorcontrib>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</creatorcontrib><description>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (
Proceedings of the National Academy of Sciences
,
110
, 19313–19317,
2013
) and Benjamin et al. (
Nature Human Behaviour
,
2
, 6–10
2018
) recommend using the significance level of
α
= 0.005 (0.5
%
) as opposed to the conventional 0.05 (5
%
) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of
α
= 0.005 and
α
= 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch
t
-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize
t
-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</description><identifier>ISSN: 1554-3528</identifier><identifier>EISSN: 1554-3528</identifier><identifier>DOI: 10.3758/s13428-021-01595-5</identifier><identifier>PMID: 34050436</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Behavioral Science and Psychology ; Cognitive Psychology ; Computer Simulation ; False Positive Reactions ; Human acts ; Human behavior ; Humans ; Models, Statistical ; Nonparametric statistics ; Probability ; Psychology ; Reproducibility ; Statistical analysis ; Statistical Distributions ; Statistical significance</subject><ispartof>Behavior Research Methods, 2021-12, Vol.53 (6), p.2712-2724</ispartof><rights>The Psychonomic Society, Inc. 2021</rights><rights>2021. The Psychonomic Society, Inc.</rights><rights>COPYRIGHT 2021 Springer</rights><rights>The Psychonomic Society, Inc. 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</citedby><cites>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</cites><orcidid>0000-0002-5904-9568</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.3758/s13428-021-01595-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.3758/s13428-021-01595-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27915,27916,41479,42548,51310</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34050436$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Noguchi, Kimihiro</creatorcontrib><creatorcontrib>Konietschke, Frank</creatorcontrib><creatorcontrib>Marmolejo-Ramos, Fernando</creatorcontrib><creatorcontrib>Pauly, Markus</creatorcontrib><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><title>Behavior Research Methods</title><addtitle>Behav Res</addtitle><addtitle>Behav Res Methods</addtitle><description>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (
Proceedings of the National Academy of Sciences
,
110
, 19313–19317,
2013
) and Benjamin et al. (
Nature Human Behaviour
,
2
, 6–10
2018
) recommend using the significance level of
α
= 0.005 (0.5
%
) as opposed to the conventional 0.05 (5
%
) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of
α
= 0.005 and
α
= 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch
t
-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize
t
-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</description><subject>Behavioral Science and Psychology</subject><subject>Cognitive Psychology</subject><subject>Computer Simulation</subject><subject>False Positive Reactions</subject><subject>Human acts</subject><subject>Human behavior</subject><subject>Humans</subject><subject>Models, Statistical</subject><subject>Nonparametric statistics</subject><subject>Probability</subject><subject>Psychology</subject><subject>Reproducibility</subject><subject>Statistical analysis</subject><subject>Statistical Distributions</subject><subject>Statistical significance</subject><issn>1554-3528</issn><issn>1554-3528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU9r3DAQxUVpSdJNv0APxVACuXgraSRbPoaQ_oGF9tCchSyPFgVb2kh2Qr59lThNSw9FhxEzvzc85hHyntEttFJ9ygwEVzXlrKZMdrKWr8gJk1LUILl6_df_mLzN-YZSUJyJI3IMgkoqoDkhux-YpmU2s4-hmjHPuTIJqxT7Jc-VCUN1iPeY3DJWZq7oVp49NUvJfh-889YEi9WIdzjmU_LGmTHju-e6Idefr35efq133798u7zY1VaoZq4HK4ahF0404HpnKfBOgUSurJODpS3KjplW8Q6tgIFx6KTtlAPlGm461cOGnK97DyneLsW0nny2OI4mYFyy5hJEw2jDaUE__oPexCWF4k7zhhYAePtIbVdqb0bUPrg4J2PLG3DyNgZ0vvQvWgYda1W53IbwVWBTzDmh04fkJ5MeNKP6MRy9hqNLOPopHC2L6MOzl6WfcHiR_E6jALACuYzCHtMfs_9Z-wtiIJeE</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Noguchi, Kimihiro</creator><creator>Konietschke, Frank</creator><creator>Marmolejo-Ramos, Fernando</creator><creator>Pauly, Markus</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IAO</scope><scope>4T-</scope><scope>7TK</scope><scope>K9.</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5904-9568</orcidid></search><sort><creationdate>20211201</creationdate><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><author>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Behavioral Science and Psychology</topic><topic>Cognitive Psychology</topic><topic>Computer Simulation</topic><topic>False Positive Reactions</topic><topic>Human acts</topic><topic>Human behavior</topic><topic>Humans</topic><topic>Models, Statistical</topic><topic>Nonparametric statistics</topic><topic>Probability</topic><topic>Psychology</topic><topic>Reproducibility</topic><topic>Statistical analysis</topic><topic>Statistical Distributions</topic><topic>Statistical significance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Noguchi, Kimihiro</creatorcontrib><creatorcontrib>Konietschke, Frank</creatorcontrib><creatorcontrib>Marmolejo-Ramos, Fernando</creatorcontrib><creatorcontrib>Pauly, Markus</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale Academic OneFile</collection><collection>Docstoc</collection><collection>Neurosciences Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Behavior Research Methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Noguchi, Kimihiro</au><au>Konietschke, Frank</au><au>Marmolejo-Ramos, Fernando</au><au>Pauly, Markus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Permutation tests are robust and powerful at 0.5% and 5% significance levels</atitle><jtitle>Behavior Research Methods</jtitle><stitle>Behav Res</stitle><addtitle>Behav Res Methods</addtitle><date>2021-12-01</date><risdate>2021</risdate><volume>53</volume><issue>6</issue><spage>2712</spage><epage>2724</epage><pages>2712-2724</pages><issn>1554-3528</issn><eissn>1554-3528</eissn><abstract>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (
Proceedings of the National Academy of Sciences
,
110
, 19313–19317,
2013
) and Benjamin et al. (
Nature Human Behaviour
,
2
, 6–10
2018
) recommend using the significance level of
α
= 0.005 (0.5
%
) as opposed to the conventional 0.05 (5
%
) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of
α
= 0.005 and
α
= 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch
t
-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize
t
-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>34050436</pmid><doi>10.3758/s13428-021-01595-5</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-5904-9568</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1554-3528 |
ispartof | Behavior Research Methods, 2021-12, Vol.53 (6), p.2712-2724 |
issn | 1554-3528 1554-3528 |
language | eng |
recordid | cdi_proquest_miscellaneous_2534610620 |
source | MEDLINE; Springer Online Journals Complete |
subjects | Behavioral Science and Psychology Cognitive Psychology Computer Simulation False Positive Reactions Human acts Human behavior Humans Models, Statistical Nonparametric statistics Probability Psychology Reproducibility Statistical analysis Statistical Distributions Statistical significance |
title | Permutation tests are robust and powerful at 0.5% and 5% significance levels |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T02%3A06%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Permutation%20tests%20are%20robust%20and%20powerful%20at%200.5%25%20and%205%25%20significance%20levels&rft.jtitle=Behavior%20Research%20Methods&rft.au=Noguchi,%20Kimihiro&rft.date=2021-12-01&rft.volume=53&rft.issue=6&rft.spage=2712&rft.epage=2724&rft.pages=2712-2724&rft.issn=1554-3528&rft.eissn=1554-3528&rft_id=info:doi/10.3758/s13428-021-01595-5&rft_dat=%3Cgale_proqu%3EA713917804%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2602033270&rft_id=info:pmid/34050436&rft_galeid=A713917804&rfr_iscdi=true |