An improved clustering algorithm and its application in IoT data analysis

With the popularization of the Internet of Things(IoT), the data are exploding. Data analysis is foundation of IoT based applications, and clustering is an important tool for data analysis. In clustering, determining the number of clusters is an important issue, which can be either designated artifi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 2019-08, Vol.159, p.63-72
Hauptverfasser: Yao, Xuanxia, Wang, Jiafei, Shen, Mengyu, Kong, Huafeng, Ning, Huansheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the popularization of the Internet of Things(IoT), the data are exploding. Data analysis is foundation of IoT based applications, and clustering is an important tool for data analysis. In clustering, determining the number of clusters is an important issue, which can be either designated artificially or determined automatically. The artificial methods have many disadvantages. And the automatic methods have distinct advantages, whose critical task is to design an appropriate clusters number updating algorithm. Although many researches have been made, most of them are not effective or cannot guarantee the unique clustering results and the good clustering accuracy rate. Meanwhile, considering that IoT based applications always involved both numerical data and nonnumeric data, and treating all the nonnumeric data in the same way is unpractical, we try to further classify the nonnumeric attributes according to their natures and explore the corresponding similarity metrics respectively. Based on it, an algorithm for determining the initial clustering centers is put forward by the dissimilarities and the densities of data objects. And then, an improved clustering algorithm is designed on a revised inter-cluster entropy for mixed data. The experiments on the 3 datasets in University of California at Irvine(UCI) show that the improved clustering algorithm is a deterministic clustering algorithm with good performance.
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2019.04.022