The essential histogram

Summary The histogram is widely used as a simple, exploratory way of displaying data, but it is usually not clear how to choose the number and size of the bins. We construct a confidence set of distribution functions that optimally deal with the two main tasks of the histogram: estimating probabilit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biometrika 2020-06, Vol.107 (2), p.347-364
Hauptverfasser: Li, Housen, Munk, Axel, Sieling, Hannes, Walther, Guenther
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary The histogram is widely used as a simple, exploratory way of displaying data, but it is usually not clear how to choose the number and size of the bins. We construct a confidence set of distribution functions that optimally deal with the two main tasks of the histogram: estimating probabilities and detecting features such as increases and modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. The only assumption we make is that the data are independent and identically distributed. We provide a fast algorithm for computing the essential histogram and illustrate our method with examples.
ISSN:0006-3444
1464-3510
DOI:10.1093/biomet/asz081