Automatic Recommendation of a Distance Measure for Clustering Algorithms
With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method c...
Gespeichert in:
Veröffentlicht in: | ACM transactions on knowledge discovery from data 2021-01, Vol.15 (1), p.1-22, Article 7 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method consists of the following steps: (1) metadata extraction, including meta-feature collection and meta-target identification; (2) recommendation model construction using metadata; and (3) distance measure recommendation for a new data set by the recommendation model. Two different types of meta-targets and meta-learning techniques are utilized considering the possible different requirements of users.
To validate the necessity and effectiveness of the distance measure recommendation method, an empirical study is conducted with 199 publicly available data sets, 9 distance measures, and 2 widely used clustering algorithms. The experimental results indicate that distance measure significantly influences the performance of the clustering algorithm for a given data set. Furthermore, performance analysis of the proposed recommendation method proves its effectiveness. |
---|---|
ISSN: | 1556-4681 1556-472X |
DOI: | 10.1145/3418228 |