Information measures of kernel estimation
Kernel estimates of entropy and mutual information have been studied extensively in statistics and econometrics. Kullback-Leibler divergence has been used in the kernel estimation literature; yet the information characteristic of kernel estimation remains unexplored. We explore kernel estimation as...
Gespeichert in:
Veröffentlicht in: | Econometric reviews 2019-01, Vol.38 (1), p.47-68 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Kernel estimates of entropy and mutual information have been studied extensively in statistics and econometrics. Kullback-Leibler divergence has been used in the kernel estimation literature; yet the information characteristic of kernel estimation remains unexplored. We explore kernel estimation as an information transmission operation where the empirical cumulative distribution function is transformed into a smooth estimate. The smooth kernel estimate is a mixture of kernel functions. The Jensen-Shannon (JS) divergence of the mixture distribution provides the information measure of kernel estimation. This measure admits Kullback-Leibler and mutual information representations and provides a lower bound for the entropy of the kernel estimate of the distribution in terms of the Shannon entropy of the kernel function and the bandwidth. The JS divergence provides guidance for kernel choice based on information-theoretic considerations which helps resolve a conundrum, namely that it is legitimate and desirable to base such choice on considerations other than the mean integrated square error of the kernel smoother. We introduce a generalized polynomial kernel (GPK) family that nests a broad range of popular kernel functions, and explore its properties in terms of Shannon and Rényi entropies. We show that these entropies and variance order the GPK functions similarly. The JS information measures of six kernel functions are compared via simulations from Gaussian, gamma, and Student-t data-generating processes. The proposed framework provides the foundation for further explorations into the information-theoretic nature of kernel smoothing. |
---|---|
ISSN: | 0747-4938 1532-4168 |
DOI: | 10.1080/07474938.2016.1222236 |