Continuous Testing: Unifying Tests and E-values

Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence again...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Koning, Nick W
Format:	Artikel
Sprache:	eng
Schlagworte:	Mathematics - Statistics Theory Statistics - Theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Koning, Nick W
description	Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. We propose a continuous generalization of a test, which we use to continuously measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous and non-randomized interpretation of the classical `randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work unifies classical testing and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we use to define a level 0 continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of a generally invalid continuous test. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.
doi_str_mv	10.48550/arxiv.2409.05654
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_05654</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_05654</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_056543</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNTM14WTQd87PK8nMK80vLVYISS0GMtOtFELzMtMqgSywSLFCYl6KgqtuWWJOaWoxDwNrWmJOcSovlOZmkHdzDXH20AUbHV9QlJmbWFQZD7IiHmyFMWEVAAzLMEw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Continuous Testing: Unifying Tests and E-values</title><source>arXiv.org</source><creator>Koning, Nick W</creator><creatorcontrib>Koning, Nick W</creatorcontrib><description>Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. We propose a continuous generalization of a test, which we use to continuously measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous and non-randomized interpretation of the classical `randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work unifies classical testing and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we use to define a level 0 continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of a generally invalid continuous test. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.</description><identifier>DOI: 10.48550/arxiv.2409.05654</identifier><language>eng</language><subject>Mathematics - Statistics Theory ; Statistics - Theory</subject><creationdate>2024-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.05654$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.05654$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Koning, Nick W</creatorcontrib><title>Continuous Testing: Unifying Tests and E-values</title><description>Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. We propose a continuous generalization of a test, which we use to continuously measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous and non-randomized interpretation of the classical `randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work unifies classical testing and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we use to define a level 0 continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of a generally invalid continuous test. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.</description><subject>Mathematics - Statistics Theory</subject><subject>Statistics - Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNTM14WTQd87PK8nMK80vLVYISS0GMtOtFELzMtMqgSywSLFCYl6KgqtuWWJOaWoxDwNrWmJOcSovlOZmkHdzDXH20AUbHV9QlJmbWFQZD7IiHmyFMWEVAAzLMEw</recordid><startdate>20240909</startdate><enddate>20240909</enddate><creator>Koning, Nick W</creator><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20240909</creationdate><title>Continuous Testing: Unifying Tests and E-values</title><author>Koning, Nick W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_056543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Mathematics - Statistics Theory</topic><topic>Statistics - Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Koning, Nick W</creatorcontrib><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koning, Nick W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Continuous Testing: Unifying Tests and E-values</atitle><date>2024-09-09</date><risdate>2024</risdate><abstract>Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. We propose a continuous generalization of a test, which we use to continuously measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous and non-randomized interpretation of the classical `randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work unifies classical testing and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we use to define a level 0 continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of a generally invalid continuous test. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.</abstract><doi>10.48550/arxiv.2409.05654</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.05654
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_05654
source	arXiv.org
subjects	Mathematics - Statistics Theory Statistics - Theory
title	Continuous Testing: Unifying Tests and E-values
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T20%3A04%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Continuous%20Testing:%20Unifying%20Tests%20and%20E-values&rft.au=Koning,%20Nick%20W&rft.date=2024-09-09&rft_id=info:doi/10.48550/arxiv.2409.05654&rft_dat=%3Carxiv_GOX%3E2409_05654%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true