Continuous Testing: Unifying Tests and E-values
Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence again...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Testing has developed into the fundamental statistical framework for
falsifying hypotheses. Unfortunately, tests are binary in nature: a test either
rejects a hypothesis or not. Such binary decisions do not reflect the reality
of many scientific studies, which often aim to present the evidence against a
hypothesis and do not necessarily intend to establish a definitive conclusion.
We propose a continuous generalization of a test, which we use to continuously
measure the evidence against a hypothesis. Such a continuous test can be viewed
as a continuous and non-randomized interpretation of the classical `randomized
test'. This offers the benefits of a randomized test, without the downsides of
external randomization. Another interpretation is as a literal measure, which
measures the amount of binary tests that reject the hypothesis. Our work
unifies classical testing and the recently proposed $e$-values: $e$-values
bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$
randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we
use to define a level 0 continuous test. Moreover, we generalize the
traditional notion of power by using generalized means. This produces a
framework that contains both classical Neyman-Pearson optimal testing and
log-optimal $e$-values, as well as a continuum of other options. The
traditional $p$-value appears as the reciprocal of a generally invalid
continuous test. In an illustration in a Gaussian location model, we find that
optimal continuous tests are of a beautifully simple form. |
---|---|
DOI: | 10.48550/arxiv.2409.05654 |