Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Perry, Ronan, Mehta, Ronak, Guo, Richard, Yezerets, Eva, Arroyo, Jesús, Powell, Mike, Helm, Hayden, Shen, Cencheng, Vogelstein, Joshua T
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees in certain simple scenarios. However, existing approaches fail in high-dimensional settings and when different features are measured on different scales.We propose decision forest-based adaptive nearest neighbor estimators and show that they are able to effectively estimate posterior probabilities, conditional entropies, and mutual information even in the aforementioned settings.We provide an extensive study of efficacy for classification and posterior probability estimation, and prove certain forest-based approaches to be consistent estimators of the true posteriors and derived information-theoretic quantities under certain assumptions. In a real-world connectome application, we quantify the uncertainty about neuron type given various cellular features in the Drosophila larva mushroom body, a key challenge for modern neuroscience.
DOI:	10.48550/arxiv.1907.00325