Parallel Density-Based Clustering of Complex Objects

In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distan...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Brecheisen, Stefan, Kriegel, Hans-Peter, Pfeifle, Martin
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	188
container_issue
container_start_page	179
container_title
container_volume
creator	Brecheisen, Stefan Kriegel, Hans-Peter Pfeifle, Martin
description	In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.
doi_str_mv	10.1007/11731139_22
format	Book Chapter
fullrecord	<record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_19687459</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>19687459</sourcerecordid><originalsourceid>FETCH-LOGICAL-p256t-da044251ee01e0c587cc9f52da652855db7060af94ea89d749bbef78be03992c3</originalsourceid><addsrcrecordid>eNpVkLtOw0AURJeXRBRc8QNuKCgM9-7D6y3B4SVFCgXUq_X6GjlsYstrJPL3OAoFTDPFHI1Gw9glwg0C6FtELRCFsZwfscToQigJQnDQ-pjNMEfMhJDm5F-WwymbgQCeGS3FOUtiXMMkgfnUOWPy1Q0uBArpgraxHXfZvYtUp2X4iiMN7fYj7Zq07DZ9oO90Va3Jj_GCnTUuREp-fc7eHx_eyudsuXp6Ke-WWc9VPma1Aym5QiJAAq8K7b1pFK9drnihVF3paZ9rjCRXmFpLU1XU6KIiEMZwL-bs6tDbu-hdaAa39W20_dBu3LCzaPJCS2Um7vrAxX6_mAZbdd1ntAh2_5z985z4AQCLWTc</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype></control><display><type>book_chapter</type><title>Parallel Density-Based Clustering of Complex Objects</title><source>Springer Books</source><creator>Brecheisen, Stefan ; Kriegel, Hans-Peter ; Pfeifle, Martin</creator><contributor>Kitsuregawa, Masaru ; Ng, Wee-Keong ; Chang, Kuiyu ; Li, Jianzhong</contributor><creatorcontrib>Brecheisen, Stefan ; Kriegel, Hans-Peter ; Pfeifle, Martin ; Kitsuregawa, Masaru ; Ng, Wee-Keong ; Chang, Kuiyu ; Li, Jianzhong</creatorcontrib><description>In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540332060</identifier><identifier>ISBN: 3540332065</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540332077</identifier><identifier>EISBN: 3540332073</identifier><identifier>DOI: 10.1007/11731139_22</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Computer science; control theory; systems ; Data processing. List processing. Character string processing ; Exact sciences and technology ; Information systems. Data bases ; Memory organisation. Data processing ; Software</subject><ispartof>Advances in Knowledge Discovery and Data Mining, 2006, p.179-188</ispartof><rights>Springer-Verlag Berlin Heidelberg 2006</rights><rights>2007 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11731139_22$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11731139_22$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>310,311,780,781,785,790,791,794,4051,4052,27930,38260,41447,42516</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=19687459$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Kitsuregawa, Masaru</contributor><contributor>Ng, Wee-Keong</contributor><contributor>Chang, Kuiyu</contributor><contributor>Li, Jianzhong</contributor><creatorcontrib>Brecheisen, Stefan</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Pfeifle, Martin</creatorcontrib><title>Parallel Density-Based Clustering of Complex Objects</title><title>Advances in Knowledge Discovery and Data Mining</title><description>In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.</description><subject>Applied sciences</subject><subject>Computer science; control theory; systems</subject><subject>Data processing. List processing. Character string processing</subject><subject>Exact sciences and technology</subject><subject>Information systems. Data bases</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540332060</isbn><isbn>3540332065</isbn><isbn>9783540332077</isbn><isbn>3540332073</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2006</creationdate><recordtype>book_chapter</recordtype><recordid>eNpVkLtOw0AURJeXRBRc8QNuKCgM9-7D6y3B4SVFCgXUq_X6GjlsYstrJPL3OAoFTDPFHI1Gw9glwg0C6FtELRCFsZwfscToQigJQnDQ-pjNMEfMhJDm5F-WwymbgQCeGS3FOUtiXMMkgfnUOWPy1Q0uBArpgraxHXfZvYtUp2X4iiMN7fYj7Zq07DZ9oO90Va3Jj_GCnTUuREp-fc7eHx_eyudsuXp6Ke-WWc9VPma1Aym5QiJAAq8K7b1pFK9drnihVF3paZ9rjCRXmFpLU1XU6KIiEMZwL-bs6tDbu-hdaAa39W20_dBu3LCzaPJCS2Um7vrAxX6_mAZbdd1ntAh2_5z985z4AQCLWTc</recordid><startdate>2006</startdate><enddate>2006</enddate><creator>Brecheisen, Stefan</creator><creator>Kriegel, Hans-Peter</creator><creator>Pfeifle, Martin</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2006</creationdate><title>Parallel Density-Based Clustering of Complex Objects</title><author>Brecheisen, Stefan ; Kriegel, Hans-Peter ; Pfeifle, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p256t-da044251ee01e0c587cc9f52da652855db7060af94ea89d749bbef78be03992c3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Computer science; control theory; systems</topic><topic>Data processing. List processing. Character string processing</topic><topic>Exact sciences and technology</topic><topic>Information systems. Data bases</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Brecheisen, Stefan</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Pfeifle, Martin</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Brecheisen, Stefan</au><au>Kriegel, Hans-Peter</au><au>Pfeifle, Martin</au><au>Kitsuregawa, Masaru</au><au>Ng, Wee-Keong</au><au>Chang, Kuiyu</au><au>Li, Jianzhong</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Parallel Density-Based Clustering of Complex Objects</atitle><btitle>Advances in Knowledge Discovery and Data Mining</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2006</date><risdate>2006</risdate><spage>179</spage><epage>188</epage><pages>179-188</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540332060</isbn><isbn>3540332065</isbn><eisbn>9783540332077</eisbn><eisbn>3540332073</eisbn><abstract>In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11731139_22</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Advances in Knowledge Discovery and Data Mining, 2006, p.179-188
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_19687459
source	Springer Books
subjects	Applied sciences Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software
title	Parallel Density-Based Clustering of Complex Objects
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T10%3A31%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Parallel%20Density-Based%20Clustering%20of%20Complex%20Objects&rft.btitle=Advances%20in%20Knowledge%20Discovery%20and%20Data%20Mining&rft.au=Brecheisen,%20Stefan&rft.date=2006&rft.spage=179&rft.epage=188&rft.pages=179-188&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540332060&rft.isbn_list=3540332065&rft_id=info:doi/10.1007/11731139_22&rft_dat=%3Cpascalfrancis_sprin%3E19687459%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540332077&rft.eisbn_list=3540332073&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true