Clustering based approach for incomplete data streams processing

Recent applications such as sensor networks generate continuous and dynamic data streams. Data streams are often gathered from multiple data sources with some incompleteness. Clustering such data is constrained by incompleteness of data, data distribution, and continuous nature of data streams. Igno...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of intelligent & fuzzy systems 2020-01, Vol.38 (3), p.3213-3227
Hauptverfasser: Najib, Fatma M., Ismail, Rasha M., Badr, Nagwa L., Gharib, Tarek F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent applications such as sensor networks generate continuous and dynamic data streams. Data streams are often gathered from multiple data sources with some incompleteness. Clustering such data is constrained by incompleteness of data, data distribution, and continuous nature of data streams. Ignoring missing values in incomplete data clustering, especially in high missing rates decreases the clustering performance. Traditional clustering is applied on the whole data without dealing with data distribution. This paper presents an efficient framework called Fuzzy c-means clustering for Incomplete Data streams (FID) that works adaptively with incomplete data streams even with high missing rates. The proposed FID estimates missing values based on the corresponding nearest-neighbors' intervals. To overcome the previously mentioned data streams clustering problems, the continuous clustering mechanism is adopted and extended to accurately handle the incomplete data streams. Experimental results using two different data sets prove the efficiency of the proposed FID comparing to the alternative approaches.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-191184