Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of machine learning and cybernetics 2014-06, Vol.5 (3), p.445-458
Hauptverfasser:	Tomašev, Nenad, Radovanović, Miloš, Mladenić, Dunja, Ivanović, Mirjana
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Classification Classifiers Complex Systems Computational Intelligence Control Data mining Engineering Influence K-nearest neighbors algorithm Machine learning Mechatronics Original Article Pattern Recognition Robotics Systems Biology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality . Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k -nearest neighbor graphs. A crisp weighted voting scheme for the k -nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k -nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k -neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard k NN classifier.
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-012-0137-1