Machine Learning using Principal Manifolds and Mode Seeking

A wide range of machine learning methods have taken advantage of density estimates and their derivatives, including methodology related to principal manifolds and mode seeking, finding use in a number of real applications. However, research concerned with improving density derivative estimation and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Myhre, Jonas Nordhaug
Format:	Dissertation
Sprache:	eng
Schlagworte:	Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A wide range of machine learning methods have taken advantage of density estimates and their derivatives, including methodology related to principal manifolds and mode seeking, finding use in a number of real applications. However, research concerned with improving density derivative estimation and its practical use have received relatively limited attention. Also, the fact that the derivatives of a distribution over a point set can provide a statistical framework for manifold learning has not yet been used to its full potential. The aim of this thesis is to help fill these gaps, and to provide novel machine learning algorithms and tools based on principal manifolds using density derivatives. We present three different lines of works aiming towards this goal. The first work presents a fast and exact kernel density derivative estimator. The method takes advantage of the fact that the derivatives of a multivariate product kernel can be decomposed into a product of univariate differentiations. By cutting redundant multiplications we obtain significant speedup while retaining an exact estimator. Next, we present a novel algorithm for manifold unwrapping based on tracing the gradient flow along a manifold estimated using density derivatives. This allows a direct and geometrically intuitive approach consistent with theory from differential geometry. Promising results are shown on both real and synthetic data sets. Finally, we provide a novel framework for robust mode seeking. It is based on ensemble clustering and resampling techniques. This allows a clustering algorithm that is both robust with respect to parameter choices as well as being capable of handling data sets of very high dimension. Concretely, we build the ensemble by running multiple instances of a k nearest neighbor mode seeking algorithm. We show good results on benchmark tests, as well as a case study involving medical health records.