Classifying token frequencies using angular Minkowski p-distance

Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance ma...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lenz, Oliver Urs, Cornelis, Chris
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate classification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter p, the dimensionality m of the dataset, the number of neighbours k, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski p-distance with suitable values for p than with classical cosine dissimilarity.
ISSN:1611-3349
0302-9743