Choosing ℓ p norms in high-dimensional spaces based on hub analysis

The hubness phenomenon is a recently discovered aspect of the curse of dimensionality. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. A closely related problem is the concentration of distances in high-dimensional...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2015-12, Vol.169, p.281-287
Hauptverfasser: Flexer, Arthur, Schnitzer, Dominik
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The hubness phenomenon is a recently discovered aspect of the curse of dimensionality. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. A closely related problem is the concentration of distances in high-dimensional spaces. Previous work has already advocated the use of fractional ℓ norms instead of the ubiquitous Euclidean norm to avoid the negative effects of distance concentration. However, which exact fractional norm to use is a largely unsolved problem. The contribution of this work is an empirical analysis of the relation of different ℓ norms and hubness. We propose an unsupervised approach for choosing an ℓ norm which minimizes hubs while simultaneously maximizing nearest neighbor classification. Our approach is evaluated on seven high-dimensional data sets and compared to three approaches that re-scale distances to avoid hubness.
ISSN:0925-2312