Clustering interval-valued data with adaptive Euclidean and City-Block distances

In several applications, data information is obtained in the form of intervals, such as the monthly temperature in a meteorological station or daily pollution levels in different locations. This paper proposes partitioning clustering algorithms for interval-valued data based on adaptive Euclidean an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2022-07, Vol.198, p.116774, Article 116774
Hauptverfasser:	Rizo Rodríguez, Sara Inés, Tenório de Carvalho, Francisco de Assis
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive distances Algorithms Boundaries Clustering Interval-valued data analysis Outliers (statistics) Partitioning clustering Pollution levels Robust clustering Subspaces Weather stations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In several applications, data information is obtained in the form of intervals, such as the monthly temperature in a meteorological station or daily pollution levels in different locations. This paper proposes partitioning clustering algorithms for interval-valued data based on adaptive Euclidean and City-Block distances. Since some boundary variables may be more relevant for the clustering process, the proposals consider the joint weights of the relevance of the lower and upper boundaries of the interval-valued variables. Consequently, clusters of different shapes and sizes in some subspaces of the variables, even in specific boundaries of the interval-valued data, can be recognized. In addition, robust dissimilarity functions were introduced to reduce the influence of outliers in the data. The adaptive distances change at each iteration of the algorithms and can be different from one cluster to another. The methods optimize an objective function by alternating three steps for obtaining the representatives of each group, the cluster partition, and the relevance weights for the interval-valued variables. Experiments on synthetic and real data sets corroborate the robustness and usefulness of the proposed adaptive clustering methods. •New clustering algorithms for interval-valued data are proposed.•The methods introduce local and global adaptive distances.•The distances consider the joint relevance of the variables of each boundary.•Experiments on synthetic and real data sets show the usefulness of the approaches.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.116774