Vectorized Highly Parallel Density-Based Clustering for Applications With Noise

Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.181679-181692
Hauptverfasser: Xavier, Joseph Arnold, Pedro Gutierrez Hermosillo Muriedas, Juan, Nassyr, Stepan, Sedona, Rocco, Gotz, Markus, Streit, Achim, Riedel, Morris, Cavallaro, Gabriele
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 181692
container_issue
container_start_page 181679
container_title IEEE access
container_volume 12
creator Xavier, Joseph Arnold
Pedro Gutierrez Hermosillo Muriedas, Juan
Nassyr, Stepan
Sedona, Rocco
Gotz, Markus
Streit, Achim
Riedel, Morris
Cavallaro, Gabriele
description Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.
doi_str_mv 10.1109/ACCESS.2024.3507193
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_3143028216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10769413</ieee_id><doaj_id>oai_doaj_org_article_89f509a67be546e0bbae4250b331d8ca</doaj_id><sourcerecordid>3143028216</sourcerecordid><originalsourceid>FETCH-LOGICAL-d200t-75b8a77a543a6667c8329832713734859f148f6b8e94a75473806787bc2206ca3</originalsourceid><addsrcrecordid>eNo9kFtLw0AQhRdBsNT-An0I-Jy698tjjdUWihXq5TFskk27Zc3G3e1D_fWGVhwYBs4cvjMMADcIThGC6n5WFPPNZoohplPCoECKXIARRlzlhBF-BSYx7uFQcpCYGIH1h6mTD_bHNNnCbnfumL3qoJ0zLns0XbTpmD_oOGwLd4jJBNtts9aHbNb3ztY6Wd_F7NOmXfbibTTX4LLVLprJ3xyD96f5W7HIV-vnZTFb5Q2GMOWCVVILoRklmnMuakmwGlogIgiVTLWIypZX0iiqBaOCSMiFFFWNMeS1JmOwPHMbr_dlH-yXDsfSa1ueBB-2pQ7J1s6UUrUMKs1FZRjlBlaVNhQzWBGCGnli3Z1ZffDfBxNTufeH0A3nlwRRArEcHji4bs8ua4z5T0RQcEURIb_s4XBw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3143028216</pqid></control><display><type>article</type><title>Vectorized Highly Parallel Density-Based Clustering for Applications With Noise</title><source>EZB Free E-Journals</source><source>DOAJ Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><creator>Xavier, Joseph Arnold ; Pedro Gutierrez Hermosillo Muriedas, Juan ; Nassyr, Stepan ; Sedona, Rocco ; Gotz, Markus ; Streit, Achim ; Riedel, Morris ; Cavallaro, Gabriele</creator><creatorcontrib>Xavier, Joseph Arnold ; Pedro Gutierrez Hermosillo Muriedas, Juan ; Nassyr, Stepan ; Sedona, Rocco ; Gotz, Markus ; Streit, Achim ; Riedel, Morris ; Cavallaro, Gabriele</creatorcontrib><description>Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.</description><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3507193</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Central Processing Unit ; Central processing units ; Clustering ; Clustering algorithms ; Computational efficiency ; CPUs ; Data mining ; Datasets ; Density ; density-based clustering ; Distributed memory ; Energy consumption ; High performance computing ; Indexing ; Merging ; Microprocessors ; Noise ; Parallel processing ; Performance enhancement ; Performance evaluation ; Processors ; SIMD (computers) ; Single instruction multiple data ; Time complexity ; vectorization ; Vectors ; VHPDBSCAN</subject><ispartof>IEEE access, 2024, Vol.12, p.181679-181692</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3239-9904 ; 0009-0007-5215-6022 ; 0000-0003-4089-972X ; 0000-0002-2233-1041 ; 0000-0002-5065-469X ; 0000-0001-8439-7145 ; 0000-0003-1810-9330</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10769413$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Xavier, Joseph Arnold</creatorcontrib><creatorcontrib>Pedro Gutierrez Hermosillo Muriedas, Juan</creatorcontrib><creatorcontrib>Nassyr, Stepan</creatorcontrib><creatorcontrib>Sedona, Rocco</creatorcontrib><creatorcontrib>Gotz, Markus</creatorcontrib><creatorcontrib>Streit, Achim</creatorcontrib><creatorcontrib>Riedel, Morris</creatorcontrib><creatorcontrib>Cavallaro, Gabriele</creatorcontrib><title>Vectorized Highly Parallel Density-Based Clustering for Applications With Noise</title><title>IEEE access</title><addtitle>Access</addtitle><description>Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.</description><subject>Algorithms</subject><subject>Central Processing Unit</subject><subject>Central processing units</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Computational efficiency</subject><subject>CPUs</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Density</subject><subject>density-based clustering</subject><subject>Distributed memory</subject><subject>Energy consumption</subject><subject>High performance computing</subject><subject>Indexing</subject><subject>Merging</subject><subject>Microprocessors</subject><subject>Noise</subject><subject>Parallel processing</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Processors</subject><subject>SIMD (computers)</subject><subject>Single instruction multiple data</subject><subject>Time complexity</subject><subject>vectorization</subject><subject>Vectors</subject><subject>VHPDBSCAN</subject><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNo9kFtLw0AQhRdBsNT-An0I-Jy698tjjdUWihXq5TFskk27Zc3G3e1D_fWGVhwYBs4cvjMMADcIThGC6n5WFPPNZoohplPCoECKXIARRlzlhBF-BSYx7uFQcpCYGIH1h6mTD_bHNNnCbnfumL3qoJ0zLns0XbTpmD_oOGwLd4jJBNtts9aHbNb3ztY6Wd_F7NOmXfbibTTX4LLVLprJ3xyD96f5W7HIV-vnZTFb5Q2GMOWCVVILoRklmnMuakmwGlogIgiVTLWIypZX0iiqBaOCSMiFFFWNMeS1JmOwPHMbr_dlH-yXDsfSa1ueBB-2pQ7J1s6UUrUMKs1FZRjlBlaVNhQzWBGCGnli3Z1ZffDfBxNTufeH0A3nlwRRArEcHji4bs8ua4z5T0RQcEURIb_s4XBw</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Xavier, Joseph Arnold</creator><creator>Pedro Gutierrez Hermosillo Muriedas, Juan</creator><creator>Nassyr, Stepan</creator><creator>Sedona, Rocco</creator><creator>Gotz, Markus</creator><creator>Streit, Achim</creator><creator>Riedel, Morris</creator><creator>Cavallaro, Gabriele</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3239-9904</orcidid><orcidid>https://orcid.org/0009-0007-5215-6022</orcidid><orcidid>https://orcid.org/0000-0003-4089-972X</orcidid><orcidid>https://orcid.org/0000-0002-2233-1041</orcidid><orcidid>https://orcid.org/0000-0002-5065-469X</orcidid><orcidid>https://orcid.org/0000-0001-8439-7145</orcidid><orcidid>https://orcid.org/0000-0003-1810-9330</orcidid></search><sort><creationdate>2024</creationdate><title>Vectorized Highly Parallel Density-Based Clustering for Applications With Noise</title><author>Xavier, Joseph Arnold ; Pedro Gutierrez Hermosillo Muriedas, Juan ; Nassyr, Stepan ; Sedona, Rocco ; Gotz, Markus ; Streit, Achim ; Riedel, Morris ; Cavallaro, Gabriele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d200t-75b8a77a543a6667c8329832713734859f148f6b8e94a75473806787bc2206ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Central Processing Unit</topic><topic>Central processing units</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Computational efficiency</topic><topic>CPUs</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Density</topic><topic>density-based clustering</topic><topic>Distributed memory</topic><topic>Energy consumption</topic><topic>High performance computing</topic><topic>Indexing</topic><topic>Merging</topic><topic>Microprocessors</topic><topic>Noise</topic><topic>Parallel processing</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Processors</topic><topic>SIMD (computers)</topic><topic>Single instruction multiple data</topic><topic>Time complexity</topic><topic>vectorization</topic><topic>Vectors</topic><topic>VHPDBSCAN</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xavier, Joseph Arnold</creatorcontrib><creatorcontrib>Pedro Gutierrez Hermosillo Muriedas, Juan</creatorcontrib><creatorcontrib>Nassyr, Stepan</creatorcontrib><creatorcontrib>Sedona, Rocco</creatorcontrib><creatorcontrib>Gotz, Markus</creatorcontrib><creatorcontrib>Streit, Achim</creatorcontrib><creatorcontrib>Riedel, Morris</creatorcontrib><creatorcontrib>Cavallaro, Gabriele</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xavier, Joseph Arnold</au><au>Pedro Gutierrez Hermosillo Muriedas, Juan</au><au>Nassyr, Stepan</au><au>Sedona, Rocco</au><au>Gotz, Markus</au><au>Streit, Achim</au><au>Riedel, Morris</au><au>Cavallaro, Gabriele</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Vectorized Highly Parallel Density-Based Clustering for Applications With Noise</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>181679</spage><epage>181692</epage><pages>181679-181692</pages><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3507193</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-3239-9904</orcidid><orcidid>https://orcid.org/0009-0007-5215-6022</orcidid><orcidid>https://orcid.org/0000-0003-4089-972X</orcidid><orcidid>https://orcid.org/0000-0002-2233-1041</orcidid><orcidid>https://orcid.org/0000-0002-5065-469X</orcidid><orcidid>https://orcid.org/0000-0001-8439-7145</orcidid><orcidid>https://orcid.org/0000-0003-1810-9330</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.181679-181692
issn 2169-3536
language eng
recordid cdi_proquest_journals_3143028216
source EZB Free E-Journals; DOAJ Directory of Open Access Journals; IEEE Xplore Open Access Journals
subjects Algorithms
Central Processing Unit
Central processing units
Clustering
Clustering algorithms
Computational efficiency
CPUs
Data mining
Datasets
Density
density-based clustering
Distributed memory
Energy consumption
High performance computing
Indexing
Merging
Microprocessors
Noise
Parallel processing
Performance enhancement
Performance evaluation
Processors
SIMD (computers)
Single instruction multiple data
Time complexity
vectorization
Vectors
VHPDBSCAN
title Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Vectorized%20Highly%20Parallel%20Density-Based%20Clustering%20for%20Applications%20With%20Noise&rft.jtitle=IEEE%20access&rft.au=Xavier,%20Joseph%20Arnold&rft.date=2024&rft.volume=12&rft.spage=181679&rft.epage=181692&rft.pages=181679-181692&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3507193&rft_dat=%3Cproquest_doaj_%3E3143028216%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3143028216&rft_id=info:pmid/&rft_ieee_id=10769413&rft_doaj_id=oai_doaj_org_article_89f509a67be546e0bbae4250b331d8ca&rfr_iscdi=true