Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN

Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-04, Vol.241, p.108288, Article 108288
Hauptverfasser: Ros, Frédéric, Guillaume, Serge, Riad, Rabia, El Hajji, Mohamed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 108288
container_title Knowledge-based systems
container_volume 241
creator Ros, Frédéric
Guillaume, Serge
Riad, Rabia
El Hajji, Mohamed
description Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.
doi_str_mv 10.1016/j.knosys.2022.108288
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03689167v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122000946</els_id><sourcerecordid>2642939671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</originalsourceid><addsrcrecordid>eNp9kMFPwyAYxYnRxDn9Dzw08eShEygtcDGZmzrNoofpmVAKyqxlQttk_700XTx6-r68770X-AFwieAMQVTcbGdfjQv7MMMQ4ygxzNgRmCBGcUoJ5MdgAnkOUwpzdArOQthCGJ2ITcDzUrdatdY1iTNJI9vOyzpRdRda7UPSW5ls0uXdZjF_SeKqa5O2XWObj6SP90NsvJ-DEyProC8OcwreH-7fFqt0_fr4tJivU0UQadPcaCVhVRFWSi2pMZVRueKQa8lYVBUvWJkzUkqKS8M4zRUrM0Qxl3lFZJlNwfXY-ylrsfP2W_q9cNKK1XwtBg1mBeOooD2K3qvRu_Pup9OhFVvX-SY-T-CCYJ7xgg4uMrqUdyF4bf5qERQDYbEVI2ExEBYj4Ri7HWM6_ra32ougrG6UrqyPTEXl7P8Fv4FdhLc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2642939671</pqid></control><display><type>article</type><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><source>Elsevier ScienceDirect Journals</source><creator>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</creator><creatorcontrib>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</creatorcontrib><description>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108288</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Clustering ; Datasets ; Density ; Distance ; Life Sciences ; Natural cluster ; Neighbors ; Self tuning</subject><ispartof>Knowledge-based systems, 2022-04, Vol.241, p.108288, Article 108288</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Apr 6, 2022</rights><rights>Attribution - NonCommercial</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</citedby><cites>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</cites><orcidid>0000-0002-0327-8249 ; 0000-0001-9954-8399 ; 0000-0001-8626-213X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122000946$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,776,780,881,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://hal.inrae.fr/hal-03689167$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Ros, Frédéric</creatorcontrib><creatorcontrib>Guillaume, Serge</creatorcontrib><creatorcontrib>Riad, Rabia</creatorcontrib><creatorcontrib>El Hajji, Mohamed</creatorcontrib><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><title>Knowledge-based systems</title><description>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Datasets</subject><subject>Density</subject><subject>Distance</subject><subject>Life Sciences</subject><subject>Natural cluster</subject><subject>Neighbors</subject><subject>Self tuning</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFPwyAYxYnRxDn9Dzw08eShEygtcDGZmzrNoofpmVAKyqxlQttk_700XTx6-r68770X-AFwieAMQVTcbGdfjQv7MMMQ4ygxzNgRmCBGcUoJ5MdgAnkOUwpzdArOQthCGJ2ITcDzUrdatdY1iTNJI9vOyzpRdRda7UPSW5ls0uXdZjF_SeKqa5O2XWObj6SP90NsvJ-DEyProC8OcwreH-7fFqt0_fr4tJivU0UQadPcaCVhVRFWSi2pMZVRueKQa8lYVBUvWJkzUkqKS8M4zRUrM0Qxl3lFZJlNwfXY-ylrsfP2W_q9cNKK1XwtBg1mBeOooD2K3qvRu_Pup9OhFVvX-SY-T-CCYJ7xgg4uMrqUdyF4bf5qERQDYbEVI2ExEBYj4Ri7HWM6_ra32ougrG6UrqyPTEXl7P8Fv4FdhLc</recordid><startdate>20220406</startdate><enddate>20220406</enddate><creator>Ros, Frédéric</creator><creator>Guillaume, Serge</creator><creator>Riad, Rabia</creator><creator>El Hajji, Mohamed</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-0327-8249</orcidid><orcidid>https://orcid.org/0000-0001-9954-8399</orcidid><orcidid>https://orcid.org/0000-0001-8626-213X</orcidid></search><sort><creationdate>20220406</creationdate><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><author>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Datasets</topic><topic>Density</topic><topic>Distance</topic><topic>Life Sciences</topic><topic>Natural cluster</topic><topic>Neighbors</topic><topic>Self tuning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ros, Frédéric</creatorcontrib><creatorcontrib>Guillaume, Serge</creatorcontrib><creatorcontrib>Riad, Rabia</creatorcontrib><creatorcontrib>El Hajji, Mohamed</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ros, Frédéric</au><au>Guillaume, Serge</au><au>Riad, Rabia</au><au>El Hajji, Mohamed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-04-06</date><risdate>2022</risdate><volume>241</volume><spage>108288</spage><pages>108288-</pages><artnum>108288</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108288</doi><orcidid>https://orcid.org/0000-0002-0327-8249</orcidid><orcidid>https://orcid.org/0000-0001-9954-8399</orcidid><orcidid>https://orcid.org/0000-0001-8626-213X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0950-7051
ispartof Knowledge-based systems, 2022-04, Vol.241, p.108288, Article 108288
issn 0950-7051
1872-7409
language eng
recordid cdi_hal_primary_oai_HAL_hal_03689167v1
source Elsevier ScienceDirect Journals
subjects Algorithms
Clustering
Datasets
Density
Distance
Life Sciences
Natural cluster
Neighbors
Self tuning
title Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T05%3A06%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detection%20of%20natural%20clusters%20via%20S-DBSCAN%20a%20Self-tuning%20version%20of%20DBSCAN&rft.jtitle=Knowledge-based%20systems&rft.au=Ros,%20Fr%C3%A9d%C3%A9ric&rft.date=2022-04-06&rft.volume=241&rft.spage=108288&rft.pages=108288-&rft.artnum=108288&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108288&rft_dat=%3Cproquest_hal_p%3E2642939671%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2642939671&rft_id=info:pmid/&rft_els_id=S0950705122000946&rfr_iscdi=true