A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU

We introduce Hybrid-Dbscan , that uses the GPU and CPUs for optimizing clustering throughput. The main idea is to exploit the memory bandwidth on the GPU for fast index searches, and optimize data transfers between host and GPU, to alleviate the potential negative performance impact of the PCIe inte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2019-04, Vol.30 (4), p.766-777
Hauptverfasser: Gowanlock, Michael, Rude, Cody M., Blair, David M., Li, Justin D., Pankratius, Victor
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 777
container_issue 4
container_start_page 766
container_title IEEE transactions on parallel and distributed systems
container_volume 30
creator Gowanlock, Michael
Rude, Cody M.
Blair, David M.
Li, Justin D.
Pankratius, Victor
description We introduce Hybrid-Dbscan , that uses the GPU and CPUs for optimizing clustering throughput. The main idea is to exploit the memory bandwidth on the GPU for fast index searches, and optimize data transfers between host and GPU, to alleviate the potential negative performance impact of the PCIe interconnect. We propose and compare two GPU kernels that exploit grid-based indexing schemes to improve neighborhood search performance. We employ a batching scheme for host-GPU data transfers to obviate limited GPU memory, and exploit concurrent operations on the host and GPU. This scheme is robust with respect to both sparse and dense data distributions and avoids buffer overflows that would otherwise degrade performance. We evaluate our approaches on ionospheric total electron content datasets as well as intermediate-redshift galaxies from the Sloan Digital Sky Survey. Hybrid-Dbscan outperforms the reference implementation across a range of application scenarios, including small workloads, which typically are the domain of CPU-only algorithms. We advance an empirical response time performance model of Hybrid-Dbscan by utilizing the underlying properties of the datasets. With only a single execution of Hybrid-Dbscan on a dataset, we are able to accurately predict the response time for a range of \epsilon ε ε search distances.
doi_str_mv 10.1109/TPDS.2018.2869777
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2191259648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8462789</ieee_id><sourcerecordid>2191259648</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-17c73a1a062f5925549a418655d596a2f10eb8a558ef60496140b8900c12ee2e3</originalsourceid><addsrcrecordid>eNo9UMtuwjAQtKpWKn18QNWLpZ5DvY7t2EdEW6hEBVLhbJmwgaBAUjs50K-vI1BPuxrNzM4OIU_AhgDMvC4Xb99DzkAPuVYmy7IrMgApdcJBp9dxZ0ImhoO5JXch7BkDIZkYkK8RnZ7WvtzQUdP42uU7WtSezpu2PJS_5XFLF867qsKKjqsutOh7bLnzdbfdNV1Lu9AD7Q7pZLF6IDeFqwI-XuY9WX28L8fTZDaffI5HsyRPU9UmkOVZ6sAxxQtpuJTCOAFaSbmRRjleAMO1djE-FooJo0CwtTaM5cAROab35OXsGyP_dBhau687f4wnbXwReHQROrLgzMp9HYLHwja-PDh_ssBs35rtW7N9a_bSWtQ8nzUlIv7ztVA80yb9A6xDZws</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2191259648</pqid></control><display><type>article</type><title>A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU</title><source>IEEE Electronic Library (IEL)</source><creator>Gowanlock, Michael ; Rude, Cody M. ; Blair, David M. ; Li, Justin D. ; Pankratius, Victor</creator><creatorcontrib>Gowanlock, Michael ; Rude, Cody M. ; Blair, David M. ; Li, Justin D. ; Pankratius, Victor</creatorcontrib><description><![CDATA[We introduce Hybrid-Dbscan , that uses the GPU and CPUs for optimizing clustering throughput. The main idea is to exploit the memory bandwidth on the GPU for fast index searches, and optimize data transfers between host and GPU, to alleviate the potential negative performance impact of the PCIe interconnect. We propose and compare two GPU kernels that exploit grid-based indexing schemes to improve neighborhood search performance. We employ a batching scheme for host-GPU data transfers to obviate limited GPU memory, and exploit concurrent operations on the host and GPU. This scheme is robust with respect to both sparse and dense data distributions and avoids buffer overflows that would otherwise degrade performance. We evaluate our approaches on ionospheric total electron content datasets as well as intermediate-redshift galaxies from the Sloan Digital Sky Survey. Hybrid-Dbscan outperforms the reference implementation across a range of application scenarios, including small workloads, which typically are the domain of CPU-only algorithms. We advance an empirical response time performance model of Hybrid-Dbscan by utilizing the underlying properties of the datasets. With only a single execution of Hybrid-Dbscan on a dataset, we are able to accurately predict the response time for a range of <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> </inline-formula> search distances.]]></description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2018.2869777</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Clustering ; Clustering algorithms ; Datasets ; DBSCAN ; Galaxies ; GPGPU ; Graphics processing units ; in-memory database ; Indexing ; Kernel ; Optimization ; parallel clustering ; Performance degradation ; query optimization ; Red shift ; Response time (computers) ; Sky surveys (astronomy) ; spatial databases ; Throughput ; Time factors</subject><ispartof>IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.766-777</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-17c73a1a062f5925549a418655d596a2f10eb8a558ef60496140b8900c12ee2e3</citedby><cites>FETCH-LOGICAL-c336t-17c73a1a062f5925549a418655d596a2f10eb8a558ef60496140b8900c12ee2e3</cites><orcidid>0000-0002-9584-2600 ; 0000-0002-0826-6204 ; 0000-0003-3315-2038 ; 0000-0002-4658-6583</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8462789$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8462789$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gowanlock, Michael</creatorcontrib><creatorcontrib>Rude, Cody M.</creatorcontrib><creatorcontrib>Blair, David M.</creatorcontrib><creatorcontrib>Li, Justin D.</creatorcontrib><creatorcontrib>Pankratius, Victor</creatorcontrib><title>A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description><![CDATA[We introduce Hybrid-Dbscan , that uses the GPU and CPUs for optimizing clustering throughput. The main idea is to exploit the memory bandwidth on the GPU for fast index searches, and optimize data transfers between host and GPU, to alleviate the potential negative performance impact of the PCIe interconnect. We propose and compare two GPU kernels that exploit grid-based indexing schemes to improve neighborhood search performance. We employ a batching scheme for host-GPU data transfers to obviate limited GPU memory, and exploit concurrent operations on the host and GPU. This scheme is robust with respect to both sparse and dense data distributions and avoids buffer overflows that would otherwise degrade performance. We evaluate our approaches on ionospheric total electron content datasets as well as intermediate-redshift galaxies from the Sloan Digital Sky Survey. Hybrid-Dbscan outperforms the reference implementation across a range of application scenarios, including small workloads, which typically are the domain of CPU-only algorithms. We advance an empirical response time performance model of Hybrid-Dbscan by utilizing the underlying properties of the datasets. With only a single execution of Hybrid-Dbscan on a dataset, we are able to accurately predict the response time for a range of <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> </inline-formula> search distances.]]></description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Datasets</subject><subject>DBSCAN</subject><subject>Galaxies</subject><subject>GPGPU</subject><subject>Graphics processing units</subject><subject>in-memory database</subject><subject>Indexing</subject><subject>Kernel</subject><subject>Optimization</subject><subject>parallel clustering</subject><subject>Performance degradation</subject><subject>query optimization</subject><subject>Red shift</subject><subject>Response time (computers)</subject><subject>Sky surveys (astronomy)</subject><subject>spatial databases</subject><subject>Throughput</subject><subject>Time factors</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9UMtuwjAQtKpWKn18QNWLpZ5DvY7t2EdEW6hEBVLhbJmwgaBAUjs50K-vI1BPuxrNzM4OIU_AhgDMvC4Xb99DzkAPuVYmy7IrMgApdcJBp9dxZ0ImhoO5JXch7BkDIZkYkK8RnZ7WvtzQUdP42uU7WtSezpu2PJS_5XFLF867qsKKjqsutOh7bLnzdbfdNV1Lu9AD7Q7pZLF6IDeFqwI-XuY9WX28L8fTZDaffI5HsyRPU9UmkOVZ6sAxxQtpuJTCOAFaSbmRRjleAMO1djE-FooJo0CwtTaM5cAROab35OXsGyP_dBhau687f4wnbXwReHQROrLgzMp9HYLHwja-PDh_ssBs35rtW7N9a_bSWtQ8nzUlIv7ztVA80yb9A6xDZws</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Gowanlock, Michael</creator><creator>Rude, Cody M.</creator><creator>Blair, David M.</creator><creator>Li, Justin D.</creator><creator>Pankratius, Victor</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9584-2600</orcidid><orcidid>https://orcid.org/0000-0002-0826-6204</orcidid><orcidid>https://orcid.org/0000-0003-3315-2038</orcidid><orcidid>https://orcid.org/0000-0002-4658-6583</orcidid></search><sort><creationdate>20190401</creationdate><title>A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU</title><author>Gowanlock, Michael ; Rude, Cody M. ; Blair, David M. ; Li, Justin D. ; Pankratius, Victor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-17c73a1a062f5925549a418655d596a2f10eb8a558ef60496140b8900c12ee2e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Datasets</topic><topic>DBSCAN</topic><topic>Galaxies</topic><topic>GPGPU</topic><topic>Graphics processing units</topic><topic>in-memory database</topic><topic>Indexing</topic><topic>Kernel</topic><topic>Optimization</topic><topic>parallel clustering</topic><topic>Performance degradation</topic><topic>query optimization</topic><topic>Red shift</topic><topic>Response time (computers)</topic><topic>Sky surveys (astronomy)</topic><topic>spatial databases</topic><topic>Throughput</topic><topic>Time factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gowanlock, Michael</creatorcontrib><creatorcontrib>Rude, Cody M.</creatorcontrib><creatorcontrib>Blair, David M.</creatorcontrib><creatorcontrib>Li, Justin D.</creatorcontrib><creatorcontrib>Pankratius, Victor</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gowanlock, Michael</au><au>Rude, Cody M.</au><au>Blair, David M.</au><au>Li, Justin D.</au><au>Pankratius, Victor</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>766</spage><epage>777</epage><pages>766-777</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract><![CDATA[We introduce Hybrid-Dbscan , that uses the GPU and CPUs for optimizing clustering throughput. The main idea is to exploit the memory bandwidth on the GPU for fast index searches, and optimize data transfers between host and GPU, to alleviate the potential negative performance impact of the PCIe interconnect. We propose and compare two GPU kernels that exploit grid-based indexing schemes to improve neighborhood search performance. We employ a batching scheme for host-GPU data transfers to obviate limited GPU memory, and exploit concurrent operations on the host and GPU. This scheme is robust with respect to both sparse and dense data distributions and avoids buffer overflows that would otherwise degrade performance. We evaluate our approaches on ionospheric total electron content datasets as well as intermediate-redshift galaxies from the Sloan Digital Sky Survey. Hybrid-Dbscan outperforms the reference implementation across a range of application scenarios, including small workloads, which typically are the domain of CPU-only algorithms. We advance an empirical response time performance model of Hybrid-Dbscan by utilizing the underlying properties of the datasets. With only a single execution of Hybrid-Dbscan on a dataset, we are able to accurately predict the response time for a range of <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> <mml:math> <mml:mi>ε</mml:mi> </mml:math> <inline-graphic xlink:href="gowanlock-ieq1-2869777.gif"/> </inline-formula> search distances.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2018.2869777</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-9584-2600</orcidid><orcidid>https://orcid.org/0000-0002-0826-6204</orcidid><orcidid>https://orcid.org/0000-0003-3315-2038</orcidid><orcidid>https://orcid.org/0000-0002-4658-6583</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.766-777
issn 1045-9219
1558-2183
language eng
recordid cdi_proquest_journals_2191259648
source IEEE Electronic Library (IEL)
subjects Algorithms
Clustering
Clustering algorithms
Datasets
DBSCAN
Galaxies
GPGPU
Graphics processing units
in-memory database
Indexing
Kernel
Optimization
parallel clustering
Performance degradation
query optimization
Red shift
Response time (computers)
Sky surveys (astronomy)
spatial databases
Throughput
Time factors
title A Hybrid Approach for Optimizing Parallel Clustering Throughput using the GPU
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T10%3A02%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Hybrid%20Approach%20for%20Optimizing%20Parallel%20Clustering%20Throughput%20using%20the%20GPU&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Gowanlock,%20Michael&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=766&rft.epage=777&rft.pages=766-777&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2018.2869777&rft_dat=%3Cproquest_RIE%3E2191259648%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2191259648&rft_id=info:pmid/&rft_ieee_id=8462789&rfr_iscdi=true