Permutation tests are robust and powerful at 0.5% and 5% significance levels

Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend usi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Behavior Research Methods 2021-12, Vol.53 (6), p.2712-2724
Hauptverfasser: Noguchi, Kimihiro, Konietschke, Frank, Marmolejo-Ramos, Fernando, Pauly, Markus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2724
container_issue 6
container_start_page 2712
container_title Behavior Research Methods
container_volume 53
creator Noguchi, Kimihiro
Konietschke, Frank
Marmolejo-Ramos, Fernando
Pauly, Markus
description Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.
doi_str_mv 10.3758/s13428-021-01595-5
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_2534610620</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A713917804</galeid><sourcerecordid>A713917804</sourcerecordid><originalsourceid>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</originalsourceid><addsrcrecordid>eNp9kU9r3DAQxUVpSdJNv0APxVACuXgraSRbPoaQ_oGF9tCchSyPFgVb2kh2Qr59lThNSw9FhxEzvzc85hHyntEttFJ9ygwEVzXlrKZMdrKWr8gJk1LUILl6_df_mLzN-YZSUJyJI3IMgkoqoDkhux-YpmU2s4-hmjHPuTIJqxT7Jc-VCUN1iPeY3DJWZq7oVp49NUvJfh-889YEi9WIdzjmU_LGmTHju-e6Idefr35efq133798u7zY1VaoZq4HK4ahF0404HpnKfBOgUSurJODpS3KjplW8Q6tgIFx6KTtlAPlGm461cOGnK97DyneLsW0nny2OI4mYFyy5hJEw2jDaUE__oPexCWF4k7zhhYAePtIbVdqb0bUPrg4J2PLG3DyNgZ0vvQvWgYda1W53IbwVWBTzDmh04fkJ5MeNKP6MRy9hqNLOPopHC2L6MOzl6WfcHiR_E6jALACuYzCHtMfs_9Z-wtiIJeE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2602033270</pqid></control><display><type>article</type><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><source>MEDLINE</source><source>Springer Online Journals Complete</source><creator>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</creator><creatorcontrib>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</creatorcontrib><description>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</description><identifier>ISSN: 1554-3528</identifier><identifier>EISSN: 1554-3528</identifier><identifier>DOI: 10.3758/s13428-021-01595-5</identifier><identifier>PMID: 34050436</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Behavioral Science and Psychology ; Cognitive Psychology ; Computer Simulation ; False Positive Reactions ; Human acts ; Human behavior ; Humans ; Models, Statistical ; Nonparametric statistics ; Probability ; Psychology ; Reproducibility ; Statistical analysis ; Statistical Distributions ; Statistical significance</subject><ispartof>Behavior Research Methods, 2021-12, Vol.53 (6), p.2712-2724</ispartof><rights>The Psychonomic Society, Inc. 2021</rights><rights>2021. The Psychonomic Society, Inc.</rights><rights>COPYRIGHT 2021 Springer</rights><rights>The Psychonomic Society, Inc. 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</citedby><cites>FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</cites><orcidid>0000-0002-5904-9568</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.3758/s13428-021-01595-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.3758/s13428-021-01595-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27915,27916,41479,42548,51310</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34050436$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Noguchi, Kimihiro</creatorcontrib><creatorcontrib>Konietschke, Frank</creatorcontrib><creatorcontrib>Marmolejo-Ramos, Fernando</creatorcontrib><creatorcontrib>Pauly, Markus</creatorcontrib><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><title>Behavior Research Methods</title><addtitle>Behav Res</addtitle><addtitle>Behav Res Methods</addtitle><description>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</description><subject>Behavioral Science and Psychology</subject><subject>Cognitive Psychology</subject><subject>Computer Simulation</subject><subject>False Positive Reactions</subject><subject>Human acts</subject><subject>Human behavior</subject><subject>Humans</subject><subject>Models, Statistical</subject><subject>Nonparametric statistics</subject><subject>Probability</subject><subject>Psychology</subject><subject>Reproducibility</subject><subject>Statistical analysis</subject><subject>Statistical Distributions</subject><subject>Statistical significance</subject><issn>1554-3528</issn><issn>1554-3528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU9r3DAQxUVpSdJNv0APxVACuXgraSRbPoaQ_oGF9tCchSyPFgVb2kh2Qr59lThNSw9FhxEzvzc85hHyntEttFJ9ygwEVzXlrKZMdrKWr8gJk1LUILl6_df_mLzN-YZSUJyJI3IMgkoqoDkhux-YpmU2s4-hmjHPuTIJqxT7Jc-VCUN1iPeY3DJWZq7oVp49NUvJfh-889YEi9WIdzjmU_LGmTHju-e6Idefr35efq133798u7zY1VaoZq4HK4ahF0404HpnKfBOgUSurJODpS3KjplW8Q6tgIFx6KTtlAPlGm461cOGnK97DyneLsW0nny2OI4mYFyy5hJEw2jDaUE__oPexCWF4k7zhhYAePtIbVdqb0bUPrg4J2PLG3DyNgZ0vvQvWgYda1W53IbwVWBTzDmh04fkJ5MeNKP6MRy9hqNLOPopHC2L6MOzl6WfcHiR_E6jALACuYzCHtMfs_9Z-wtiIJeE</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Noguchi, Kimihiro</creator><creator>Konietschke, Frank</creator><creator>Marmolejo-Ramos, Fernando</creator><creator>Pauly, Markus</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IAO</scope><scope>4T-</scope><scope>7TK</scope><scope>K9.</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5904-9568</orcidid></search><sort><creationdate>20211201</creationdate><title>Permutation tests are robust and powerful at 0.5% and 5% significance levels</title><author>Noguchi, Kimihiro ; Konietschke, Frank ; Marmolejo-Ramos, Fernando ; Pauly, Markus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c486t-dc4ddb4f463fbfc0329835e28cf5dc07e591a7829ec43d12395c98f38f62a98b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Behavioral Science and Psychology</topic><topic>Cognitive Psychology</topic><topic>Computer Simulation</topic><topic>False Positive Reactions</topic><topic>Human acts</topic><topic>Human behavior</topic><topic>Humans</topic><topic>Models, Statistical</topic><topic>Nonparametric statistics</topic><topic>Probability</topic><topic>Psychology</topic><topic>Reproducibility</topic><topic>Statistical analysis</topic><topic>Statistical Distributions</topic><topic>Statistical significance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Noguchi, Kimihiro</creatorcontrib><creatorcontrib>Konietschke, Frank</creatorcontrib><creatorcontrib>Marmolejo-Ramos, Fernando</creatorcontrib><creatorcontrib>Pauly, Markus</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale Academic OneFile</collection><collection>Docstoc</collection><collection>Neurosciences Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Behavior Research Methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Noguchi, Kimihiro</au><au>Konietschke, Frank</au><au>Marmolejo-Ramos, Fernando</au><au>Pauly, Markus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Permutation tests are robust and powerful at 0.5% and 5% significance levels</atitle><jtitle>Behavior Research Methods</jtitle><stitle>Behav Res</stitle><addtitle>Behav Res Methods</addtitle><date>2021-12-01</date><risdate>2021</risdate><volume>53</volume><issue>6</issue><spage>2712</spage><epage>2724</epage><pages>2712-2724</pages><issn>1554-3528</issn><eissn>1554-3528</eissn><abstract>Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson ( Proceedings of the National Academy of Sciences , 110 , 19313–19317, 2013 ) and Benjamin et al. ( Nature Human Behaviour , 2 , 6–10 2018 ) recommend using the significance level of α = 0.005 (0.5 % ) as opposed to the conventional 0.05 (5 % ) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t -test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t -distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>34050436</pmid><doi>10.3758/s13428-021-01595-5</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-5904-9568</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1554-3528
ispartof Behavior Research Methods, 2021-12, Vol.53 (6), p.2712-2724
issn 1554-3528
1554-3528
language eng
recordid cdi_proquest_miscellaneous_2534610620
source MEDLINE; Springer Online Journals Complete
subjects Behavioral Science and Psychology
Cognitive Psychology
Computer Simulation
False Positive Reactions
Human acts
Human behavior
Humans
Models, Statistical
Nonparametric statistics
Probability
Psychology
Reproducibility
Statistical analysis
Statistical Distributions
Statistical significance
title Permutation tests are robust and powerful at 0.5% and 5% significance levels
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T02%3A06%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Permutation%20tests%20are%20robust%20and%20powerful%20at%200.5%25%20and%205%25%20significance%20levels&rft.jtitle=Behavior%20Research%20Methods&rft.au=Noguchi,%20Kimihiro&rft.date=2021-12-01&rft.volume=53&rft.issue=6&rft.spage=2712&rft.epage=2724&rft.pages=2712-2724&rft.issn=1554-3528&rft.eissn=1554-3528&rft_id=info:doi/10.3758/s13428-021-01595-5&rft_dat=%3Cgale_proqu%3EA713917804%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2602033270&rft_id=info:pmid/34050436&rft_galeid=A713917804&rfr_iscdi=true