Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known...
Gespeichert in:
Veröffentlicht in: | International journal of machine learning and cybernetics 2014-06, Vol.5 (3), p.445-458 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known
curse of dimensionality
. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as
hubness
and refers to the emergence of very influential nodes (hubs) in
k
-nearest neighbor graphs. A crisp weighted voting scheme for the
k
-nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the
soft
approach, and propose several fuzzy measures for
k
-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in
k
-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard
k
NN classifier. |
---|---|
ISSN: | 1868-8071 1868-808X |
DOI: | 10.1007/s13042-012-0137-1 |