Continuous Testing: Unifying Tests and E-values
Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence again...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Koning, Nick W |
description | Testing has developed into the fundamental statistical framework for
falsifying hypotheses. Unfortunately, tests are binary in nature: a test either
rejects a hypothesis or not. Such binary decisions do not reflect the reality
of many scientific studies, which often aim to present the evidence against a
hypothesis and do not necessarily intend to establish a definitive conclusion.
We propose a continuous generalization of a test, which we use to continuously
measure the evidence against a hypothesis. Such a continuous test can be viewed
as a continuous and non-randomized interpretation of the classical `randomized
test'. This offers the benefits of a randomized test, without the downsides of
external randomization. Another interpretation is as a literal measure, which
measures the amount of binary tests that reject the hypothesis. Our work
unifies classical testing and the recently proposed $e$-values: $e$-values
bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$
randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we
use to define a level 0 continuous test. Moreover, we generalize the
traditional notion of power by using generalized means. This produces a
framework that contains both classical Neyman-Pearson optimal testing and
log-optimal $e$-values, as well as a continuum of other options. The
traditional $p$-value appears as the reciprocal of a generally invalid
continuous test. In an illustration in a Gaussian location model, we find that
optimal continuous tests are of a beautifully simple form. |
doi_str_mv | 10.48550/arxiv.2409.05654 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_05654</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_05654</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_056543</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNTM14WTQd87PK8nMK80vLVYISS0GMtOtFELzMtMqgSywSLFCYl6KgqtuWWJOaWoxDwNrWmJOcSovlOZmkHdzDXH20AUbHV9QlJmbWFQZD7IiHmyFMWEVAAzLMEw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Continuous Testing: Unifying Tests and E-values</title><source>arXiv.org</source><creator>Koning, Nick W</creator><creatorcontrib>Koning, Nick W</creatorcontrib><description>Testing has developed into the fundamental statistical framework for
falsifying hypotheses. Unfortunately, tests are binary in nature: a test either
rejects a hypothesis or not. Such binary decisions do not reflect the reality
of many scientific studies, which often aim to present the evidence against a
hypothesis and do not necessarily intend to establish a definitive conclusion.
We propose a continuous generalization of a test, which we use to continuously
measure the evidence against a hypothesis. Such a continuous test can be viewed
as a continuous and non-randomized interpretation of the classical `randomized
test'. This offers the benefits of a randomized test, without the downsides of
external randomization. Another interpretation is as a literal measure, which
measures the amount of binary tests that reject the hypothesis. Our work
unifies classical testing and the recently proposed $e$-values: $e$-values
bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$
randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we
use to define a level 0 continuous test. Moreover, we generalize the
traditional notion of power by using generalized means. This produces a
framework that contains both classical Neyman-Pearson optimal testing and
log-optimal $e$-values, as well as a continuum of other options. The
traditional $p$-value appears as the reciprocal of a generally invalid
continuous test. In an illustration in a Gaussian location model, we find that
optimal continuous tests are of a beautifully simple form.</description><identifier>DOI: 10.48550/arxiv.2409.05654</identifier><language>eng</language><subject>Mathematics - Statistics Theory ; Statistics - Theory</subject><creationdate>2024-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.05654$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.05654$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Koning, Nick W</creatorcontrib><title>Continuous Testing: Unifying Tests and E-values</title><description>Testing has developed into the fundamental statistical framework for
falsifying hypotheses. Unfortunately, tests are binary in nature: a test either
rejects a hypothesis or not. Such binary decisions do not reflect the reality
of many scientific studies, which often aim to present the evidence against a
hypothesis and do not necessarily intend to establish a definitive conclusion.
We propose a continuous generalization of a test, which we use to continuously
measure the evidence against a hypothesis. Such a continuous test can be viewed
as a continuous and non-randomized interpretation of the classical `randomized
test'. This offers the benefits of a randomized test, without the downsides of
external randomization. Another interpretation is as a literal measure, which
measures the amount of binary tests that reject the hypothesis. Our work
unifies classical testing and the recently proposed $e$-values: $e$-values
bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$
randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we
use to define a level 0 continuous test. Moreover, we generalize the
traditional notion of power by using generalized means. This produces a
framework that contains both classical Neyman-Pearson optimal testing and
log-optimal $e$-values, as well as a continuum of other options. The
traditional $p$-value appears as the reciprocal of a generally invalid
continuous test. In an illustration in a Gaussian location model, we find that
optimal continuous tests are of a beautifully simple form.</description><subject>Mathematics - Statistics Theory</subject><subject>Statistics - Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwNTM14WTQd87PK8nMK80vLVYISS0GMtOtFELzMtMqgSywSLFCYl6KgqtuWWJOaWoxDwNrWmJOcSovlOZmkHdzDXH20AUbHV9QlJmbWFQZD7IiHmyFMWEVAAzLMEw</recordid><startdate>20240909</startdate><enddate>20240909</enddate><creator>Koning, Nick W</creator><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20240909</creationdate><title>Continuous Testing: Unifying Tests and E-values</title><author>Koning, Nick W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_056543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Mathematics - Statistics Theory</topic><topic>Statistics - Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Koning, Nick W</creatorcontrib><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koning, Nick W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Continuous Testing: Unifying Tests and E-values</atitle><date>2024-09-09</date><risdate>2024</risdate><abstract>Testing has developed into the fundamental statistical framework for
falsifying hypotheses. Unfortunately, tests are binary in nature: a test either
rejects a hypothesis or not. Such binary decisions do not reflect the reality
of many scientific studies, which often aim to present the evidence against a
hypothesis and do not necessarily intend to establish a definitive conclusion.
We propose a continuous generalization of a test, which we use to continuously
measure the evidence against a hypothesis. Such a continuous test can be viewed
as a continuous and non-randomized interpretation of the classical `randomized
test'. This offers the benefits of a randomized test, without the downsides of
external randomization. Another interpretation is as a literal measure, which
measures the amount of binary tests that reject the hypothesis. Our work
unifies classical testing and the recently proposed $e$-values: $e$-values
bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$
randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we
use to define a level 0 continuous test. Moreover, we generalize the
traditional notion of power by using generalized means. This produces a
framework that contains both classical Neyman-Pearson optimal testing and
log-optimal $e$-values, as well as a continuum of other options. The
traditional $p$-value appears as the reciprocal of a generally invalid
continuous test. In an illustration in a Gaussian location model, we find that
optimal continuous tests are of a beautifully simple form.</abstract><doi>10.48550/arxiv.2409.05654</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2409.05654 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2409_05654 |
source | arXiv.org |
subjects | Mathematics - Statistics Theory Statistics - Theory |
title | Continuous Testing: Unifying Tests and E-values |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T20%3A04%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Continuous%20Testing:%20Unifying%20Tests%20and%20E-values&rft.au=Koning,%20Nick%20W&rft.date=2024-09-09&rft_id=info:doi/10.48550/arxiv.2409.05654&rft_dat=%3Carxiv_GOX%3E2409_05654%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |