An MDS-based unifying approach to time series K-means clustering: application in the dynamic time warping framework
Partitioning algorithms, and in particular K-means clustering, are widely used in time series analysis. K-means clustering is intrinsically related to the use of the Euclidean distance as a measure of dissimilarity. When other dissimilarity measures, such as dynamic time warping, are involved, K-mea...
Gespeichert in:
Veröffentlicht in: | Stochastic environmental research and risk assessment 2023-12, Vol.37 (12), p.4555-4566 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Partitioning algorithms, and in particular K-means clustering, are widely used in time series analysis. K-means clustering is intrinsically related to the use of the Euclidean distance as a measure of dissimilarity. When other dissimilarity measures, such as dynamic time warping, are involved, K-means clustering is usually replaced by the optimisation of a sums-of-the-stars clustering criterion, giving rise to an algorithm other than that of K-means, such as K-medoids. Another common restriction in the implementation of K-means concerns the need to estimate the average as the cluster prototype, which may represent a drawback for this method in time series when elastic measures such as dynamic time warping are used. In this paper, we propose a multidimensional scaling based K-means clustering algorithm that enables the use of K-means clustering together with any dissimilarity measure, and in particular with dynamic time warping, without requiring us to estimate cluster prototypes for the time series. This procedure is a true K-means clustering algorithm that searches for the partition in an equivalent auxiliary configuration, usually in a dimension lower than the time series length. The approach proposed is of particular interest when dynamic time warping is used in the analysis of series of unequal length and/or when some data are missing, and hence Euclidean distances cannot be used. The performance of our procedure is tested by conducting an extensive Monte Carlo experiment, comparing the results with those obtained by K-medoids. The procedure is also illustrated with the analysis of carbon dioxide emissions from 133 countries. |
---|---|
ISSN: | 1436-3240 1436-3259 |
DOI: | 10.1007/s00477-023-02470-9 |