Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN

Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-04, Vol.241, p.108288, Article 108288
Hauptverfasser:	Ros, Frédéric, Guillaume, Serge, Riad, Rabia, El Hajji, Mohamed
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Datasets Density Distance Life Sciences Natural cluster Neighbors Self tuning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	108288
container_title	Knowledge-based systems
container_volume	241
creator	Ros, Frédéric Guillaume, Serge Riad, Rabia El Hajji, Mohamed
description	Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.
doi_str_mv	10.1016/j.knosys.2022.108288
format	Article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03689167v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122000946</els_id><sourcerecordid>2642939671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</originalsourceid><addsrcrecordid>eNp9kMFPwyAYxYnRxDn9Dzw08eShEygtcDGZmzrNoofpmVAKyqxlQttk_700XTx6-r68770X-AFwieAMQVTcbGdfjQv7MMMQ4ygxzNgRmCBGcUoJ5MdgAnkOUwpzdArOQthCGJ2ITcDzUrdatdY1iTNJI9vOyzpRdRda7UPSW5ls0uXdZjF_SeKqa5O2XWObj6SP90NsvJ-DEyProC8OcwreH-7fFqt0_fr4tJivU0UQadPcaCVhVRFWSi2pMZVRueKQa8lYVBUvWJkzUkqKS8M4zRUrM0Qxl3lFZJlNwfXY-ylrsfP2W_q9cNKK1XwtBg1mBeOooD2K3qvRu_Pup9OhFVvX-SY-T-CCYJ7xgg4uMrqUdyF4bf5qERQDYbEVI2ExEBYj4Ri7HWM6_ra32ougrG6UrqyPTEXl7P8Fv4FdhLc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2642939671</pqid></control><display><type>article</type><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><source>Elsevier ScienceDirect Journals</source><creator>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</creator><creatorcontrib>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</creatorcontrib><description>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108288</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Clustering ; Datasets ; Density ; Distance ; Life Sciences ; Natural cluster ; Neighbors ; Self tuning</subject><ispartof>Knowledge-based systems, 2022-04, Vol.241, p.108288, Article 108288</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Apr 6, 2022</rights><rights>Attribution - NonCommercial</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</citedby><cites>FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</cites><orcidid>0000-0002-0327-8249 ; 0000-0001-9954-8399 ; 0000-0001-8626-213X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122000946$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,776,780,881,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://hal.inrae.fr/hal-03689167$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Ros, Frédéric</creatorcontrib><creatorcontrib>Guillaume, Serge</creatorcontrib><creatorcontrib>Riad, Rabia</creatorcontrib><creatorcontrib>El Hajji, Mohamed</creatorcontrib><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><title>Knowledge-based systems</title><description>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Datasets</subject><subject>Density</subject><subject>Distance</subject><subject>Life Sciences</subject><subject>Natural cluster</subject><subject>Neighbors</subject><subject>Self tuning</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFPwyAYxYnRxDn9Dzw08eShEygtcDGZmzrNoofpmVAKyqxlQttk_700XTx6-r68770X-AFwieAMQVTcbGdfjQv7MMMQ4ygxzNgRmCBGcUoJ5MdgAnkOUwpzdArOQthCGJ2ITcDzUrdatdY1iTNJI9vOyzpRdRda7UPSW5ls0uXdZjF_SeKqa5O2XWObj6SP90NsvJ-DEyProC8OcwreH-7fFqt0_fr4tJivU0UQadPcaCVhVRFWSi2pMZVRueKQa8lYVBUvWJkzUkqKS8M4zRUrM0Qxl3lFZJlNwfXY-ylrsfP2W_q9cNKK1XwtBg1mBeOooD2K3qvRu_Pup9OhFVvX-SY-T-CCYJ7xgg4uMrqUdyF4bf5qERQDYbEVI2ExEBYj4Ri7HWM6_ra32ougrG6UrqyPTEXl7P8Fv4FdhLc</recordid><startdate>20220406</startdate><enddate>20220406</enddate><creator>Ros, Frédéric</creator><creator>Guillaume, Serge</creator><creator>Riad, Rabia</creator><creator>El Hajji, Mohamed</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-0327-8249</orcidid><orcidid>https://orcid.org/0000-0001-9954-8399</orcidid><orcidid>https://orcid.org/0000-0001-8626-213X</orcidid></search><sort><creationdate>20220406</creationdate><title>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</title><author>Ros, Frédéric ; Guillaume, Serge ; Riad, Rabia ; El Hajji, Mohamed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c414t-5feca0dd48baea7ffdfc5c909ea88d48c968b584ba72bf8975c8b31729a5d4ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Datasets</topic><topic>Density</topic><topic>Distance</topic><topic>Life Sciences</topic><topic>Natural cluster</topic><topic>Neighbors</topic><topic>Self tuning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ros, Frédéric</creatorcontrib><creatorcontrib>Guillaume, Serge</creatorcontrib><creatorcontrib>Riad, Rabia</creatorcontrib><creatorcontrib>El Hajji, Mohamed</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ros, Frédéric</au><au>Guillaume, Serge</au><au>Riad, Rabia</au><au>El Hajji, Mohamed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-04-06</date><risdate>2022</risdate><volume>241</volume><spage>108288</spage><pages>108288-</pages><artnum>108288</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN11A sample code is (will be when ready for publication) available at: http://frederic.rosresearch.free.fr/mydata/homepage/. is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, k-nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108288</doi><orcidid>https://orcid.org/0000-0002-0327-8249</orcidid><orcidid>https://orcid.org/0000-0001-9954-8399</orcidid><orcidid>https://orcid.org/0000-0001-8626-213X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2022-04, Vol.241, p.108288, Article 108288
issn	0950-7051 1872-7409
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_03689167v1
source	Elsevier ScienceDirect Journals
subjects	Algorithms Clustering Datasets Density Distance Life Sciences Natural cluster Neighbors Self tuning
title	Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T05%3A06%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detection%20of%20natural%20clusters%20via%20S-DBSCAN%20a%20Self-tuning%20version%20of%20DBSCAN&rft.jtitle=Knowledge-based%20systems&rft.au=Ros,%20Fr%C3%A9d%C3%A9ric&rft.date=2022-04-06&rft.volume=241&rft.spage=108288&rft.pages=108288-&rft.artnum=108288&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108288&rft_dat=%3Cproquest_hal_p%3E2642939671%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2642939671&rft_id=info:pmid/&rft_els_id=S0950705122000946&rfr_iscdi=true