Incomplete high dimensional data streams clustering

Many recent applications such as sensor networks generate continuous and time varying data streams that are often gathered from multiple data sources with some incompleteness and high dimensionality. Clustering such incomplete high dimensional streaming data faces four constraints which are 1) data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of intelligent & fuzzy systems 2020-01, Vol.39 (3), p.4227-4243
Hauptverfasser: Najib, Fatma M., Ismail, Rasha M., Badr, Nagwa L., Gharib, Tarek F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4243
container_issue 3
container_start_page 4227
container_title Journal of intelligent & fuzzy systems
container_volume 39
creator Najib, Fatma M.
Ismail, Rasha M.
Badr, Nagwa L.
Gharib, Tarek F.
description Many recent applications such as sensor networks generate continuous and time varying data streams that are often gathered from multiple data sources with some incompleteness and high dimensionality. Clustering such incomplete high dimensional streaming data faces four constraints which are 1) data incompleteness, 2) high dimensionality of data, 3) data distribution, 4) data streams’ continuous nature. Thus, in this paper, we propose the Subspace clustering for Incomplete High dimensional Data streams (SIHD) framework that overcomes the above clustering issues. The proposed SIHD provides continuous missing values imputation for incomplete streams based on the corresponding nearest-neighbors’ intervals. An adaptive subspace clustering mechanism is proposed to deal with such incomplete high dimensional data streams. Our experimental results using two different data sets prove the efficiency of the proposed SIHD framework in clustering such incomplete high dimensional data streams in terms of accuracy, precision, sensitivity, specificity, and F-score compared to five algorithms GFCM, GBDC-P2P, DS, Ensemble, and DMSC. The proposed SIHD improved: 1) the accuracy on average over the five algorithms in the same mentioned order by 11.3%, 10.8%, 6.5%, 4.1%, and 3.6%, 2) the precision by 15%, 10.6%, 6.4%, 4%, and 3.5%, 3) the sensitivity by 16.6%, 10.6%, 5.8%, 4.2%, and 3.6%, 4) the specificity by 16.8%, 10.9%, 6.5%, 4%, and 3.5%, 5) the F-score by 16.6%, 10.7%, 6.6%, 4.1%, and 3.6%.
doi_str_mv 10.3233/JIFS-200297
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2449456934</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2449456934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c261t-193ad7995868618213d9f8a82f5c8c75f9c27eefd2f63896ea7e5c8d3ea5a65f3</originalsourceid><addsrcrecordid>eNotkEtLAzEcxIMoWKsnv8CCR1nN-3GUYutKwYN6DiH7T7tlXybZg9_eLetpBmYYhh9C9wQ_McrY83u1_SwpxtSoC7QiWolSG6kuZ48lLwnl8hrdpHTCmChB8QqxqvdDN7aQoTg2h2NRNx30qRl61xa1y65IOYLrUuHbKWWITX-4RVfBtQnu_nWNvrevX5u3cv-xqzYv-9JTSXJJDHO1MkZoqSXRlLDaBO00DcJrr0QwniqAUNMg2XwTnII5qRk44aQIbI0elt0xDj8TpGxPwxTnY8lSzg0X0jA-tx6Xlo9DShGCHWPTufhrCbZnKvZMxS5U2B_wUlQW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2449456934</pqid></control><display><type>article</type><title>Incomplete high dimensional data streams clustering</title><source>EBSCOhost Business Source Complete</source><creator>Najib, Fatma M. ; Ismail, Rasha M. ; Badr, Nagwa L. ; Gharib, Tarek F.</creator><creatorcontrib>Najib, Fatma M. ; Ismail, Rasha M. ; Badr, Nagwa L. ; Gharib, Tarek F.</creatorcontrib><description>Many recent applications such as sensor networks generate continuous and time varying data streams that are often gathered from multiple data sources with some incompleteness and high dimensionality. Clustering such incomplete high dimensional streaming data faces four constraints which are 1) data incompleteness, 2) high dimensionality of data, 3) data distribution, 4) data streams’ continuous nature. Thus, in this paper, we propose the Subspace clustering for Incomplete High dimensional Data streams (SIHD) framework that overcomes the above clustering issues. The proposed SIHD provides continuous missing values imputation for incomplete streams based on the corresponding nearest-neighbors’ intervals. An adaptive subspace clustering mechanism is proposed to deal with such incomplete high dimensional data streams. Our experimental results using two different data sets prove the efficiency of the proposed SIHD framework in clustering such incomplete high dimensional data streams in terms of accuracy, precision, sensitivity, specificity, and F-score compared to five algorithms GFCM, GBDC-P2P, DS, Ensemble, and DMSC. The proposed SIHD improved: 1) the accuracy on average over the five algorithms in the same mentioned order by 11.3%, 10.8%, 6.5%, 4.1%, and 3.6%, 2) the precision by 15%, 10.6%, 6.4%, 4%, and 3.5%, 3) the sensitivity by 16.6%, 10.6%, 5.8%, 4.2%, and 3.6%, 4) the specificity by 16.8%, 10.9%, 6.5%, 4%, and 3.5%, 5) the F-score by 16.6%, 10.7%, 6.6%, 4.1%, and 3.6%.</description><identifier>ISSN: 1064-1246</identifier><identifier>EISSN: 1875-8967</identifier><identifier>DOI: 10.3233/JIFS-200297</identifier><language>eng</language><publisher>Amsterdam: IOS Press BV</publisher><subject>Algorithms ; Clustering ; Data transmission ; Sensitivity</subject><ispartof>Journal of intelligent &amp; fuzzy systems, 2020-01, Vol.39 (3), p.4227-4243</ispartof><rights>Copyright IOS Press BV 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c261t-193ad7995868618213d9f8a82f5c8c75f9c27eefd2f63896ea7e5c8d3ea5a65f3</citedby><cites>FETCH-LOGICAL-c261t-193ad7995868618213d9f8a82f5c8c75f9c27eefd2f63896ea7e5c8d3ea5a65f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Najib, Fatma M.</creatorcontrib><creatorcontrib>Ismail, Rasha M.</creatorcontrib><creatorcontrib>Badr, Nagwa L.</creatorcontrib><creatorcontrib>Gharib, Tarek F.</creatorcontrib><title>Incomplete high dimensional data streams clustering</title><title>Journal of intelligent &amp; fuzzy systems</title><description>Many recent applications such as sensor networks generate continuous and time varying data streams that are often gathered from multiple data sources with some incompleteness and high dimensionality. Clustering such incomplete high dimensional streaming data faces four constraints which are 1) data incompleteness, 2) high dimensionality of data, 3) data distribution, 4) data streams’ continuous nature. Thus, in this paper, we propose the Subspace clustering for Incomplete High dimensional Data streams (SIHD) framework that overcomes the above clustering issues. The proposed SIHD provides continuous missing values imputation for incomplete streams based on the corresponding nearest-neighbors’ intervals. An adaptive subspace clustering mechanism is proposed to deal with such incomplete high dimensional data streams. Our experimental results using two different data sets prove the efficiency of the proposed SIHD framework in clustering such incomplete high dimensional data streams in terms of accuracy, precision, sensitivity, specificity, and F-score compared to five algorithms GFCM, GBDC-P2P, DS, Ensemble, and DMSC. The proposed SIHD improved: 1) the accuracy on average over the five algorithms in the same mentioned order by 11.3%, 10.8%, 6.5%, 4.1%, and 3.6%, 2) the precision by 15%, 10.6%, 6.4%, 4%, and 3.5%, 3) the sensitivity by 16.6%, 10.6%, 5.8%, 4.2%, and 3.6%, 4) the specificity by 16.8%, 10.9%, 6.5%, 4%, and 3.5%, 5) the F-score by 16.6%, 10.7%, 6.6%, 4.1%, and 3.6%.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Data transmission</subject><subject>Sensitivity</subject><issn>1064-1246</issn><issn>1875-8967</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNotkEtLAzEcxIMoWKsnv8CCR1nN-3GUYutKwYN6DiH7T7tlXybZg9_eLetpBmYYhh9C9wQ_McrY83u1_SwpxtSoC7QiWolSG6kuZ48lLwnl8hrdpHTCmChB8QqxqvdDN7aQoTg2h2NRNx30qRl61xa1y65IOYLrUuHbKWWITX-4RVfBtQnu_nWNvrevX5u3cv-xqzYv-9JTSXJJDHO1MkZoqSXRlLDaBO00DcJrr0QwniqAUNMg2XwTnII5qRk44aQIbI0elt0xDj8TpGxPwxTnY8lSzg0X0jA-tx6Xlo9DShGCHWPTufhrCbZnKvZMxS5U2B_wUlQW</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Najib, Fatma M.</creator><creator>Ismail, Rasha M.</creator><creator>Badr, Nagwa L.</creator><creator>Gharib, Tarek F.</creator><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20200101</creationdate><title>Incomplete high dimensional data streams clustering</title><author>Najib, Fatma M. ; Ismail, Rasha M. ; Badr, Nagwa L. ; Gharib, Tarek F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c261t-193ad7995868618213d9f8a82f5c8c75f9c27eefd2f63896ea7e5c8d3ea5a65f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Data transmission</topic><topic>Sensitivity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Najib, Fatma M.</creatorcontrib><creatorcontrib>Ismail, Rasha M.</creatorcontrib><creatorcontrib>Badr, Nagwa L.</creatorcontrib><creatorcontrib>Gharib, Tarek F.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Najib, Fatma M.</au><au>Ismail, Rasha M.</au><au>Badr, Nagwa L.</au><au>Gharib, Tarek F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Incomplete high dimensional data streams clustering</atitle><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle><date>2020-01-01</date><risdate>2020</risdate><volume>39</volume><issue>3</issue><spage>4227</spage><epage>4243</epage><pages>4227-4243</pages><issn>1064-1246</issn><eissn>1875-8967</eissn><abstract>Many recent applications such as sensor networks generate continuous and time varying data streams that are often gathered from multiple data sources with some incompleteness and high dimensionality. Clustering such incomplete high dimensional streaming data faces four constraints which are 1) data incompleteness, 2) high dimensionality of data, 3) data distribution, 4) data streams’ continuous nature. Thus, in this paper, we propose the Subspace clustering for Incomplete High dimensional Data streams (SIHD) framework that overcomes the above clustering issues. The proposed SIHD provides continuous missing values imputation for incomplete streams based on the corresponding nearest-neighbors’ intervals. An adaptive subspace clustering mechanism is proposed to deal with such incomplete high dimensional data streams. Our experimental results using two different data sets prove the efficiency of the proposed SIHD framework in clustering such incomplete high dimensional data streams in terms of accuracy, precision, sensitivity, specificity, and F-score compared to five algorithms GFCM, GBDC-P2P, DS, Ensemble, and DMSC. The proposed SIHD improved: 1) the accuracy on average over the five algorithms in the same mentioned order by 11.3%, 10.8%, 6.5%, 4.1%, and 3.6%, 2) the precision by 15%, 10.6%, 6.4%, 4%, and 3.5%, 3) the sensitivity by 16.6%, 10.6%, 5.8%, 4.2%, and 3.6%, 4) the specificity by 16.8%, 10.9%, 6.5%, 4%, and 3.5%, 5) the F-score by 16.6%, 10.7%, 6.6%, 4.1%, and 3.6%.</abstract><cop>Amsterdam</cop><pub>IOS Press BV</pub><doi>10.3233/JIFS-200297</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1064-1246
ispartof Journal of intelligent & fuzzy systems, 2020-01, Vol.39 (3), p.4227-4243
issn 1064-1246
1875-8967
language eng
recordid cdi_proquest_journals_2449456934
source EBSCOhost Business Source Complete
subjects Algorithms
Clustering
Data transmission
Sensitivity
title Incomplete high dimensional data streams clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T21%3A18%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Incomplete%20high%20dimensional%20data%20streams%20clustering&rft.jtitle=Journal%20of%20intelligent%20&%20fuzzy%20systems&rft.au=Najib,%20Fatma%20M.&rft.date=2020-01-01&rft.volume=39&rft.issue=3&rft.spage=4227&rft.epage=4243&rft.pages=4227-4243&rft.issn=1064-1246&rft.eissn=1875-8967&rft_id=info:doi/10.3233/JIFS-200297&rft_dat=%3Cproquest_cross%3E2449456934%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2449456934&rft_id=info:pmid/&rfr_iscdi=true