A Population Background for Nonparametric Density-Based Clustering

Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Statistical science 2015-11, Vol.30 (4), p.518-532
1. Verfasser:	Chacón, José E.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering consistency distance in measure Hausdorff distance Mathematical functions Mathematical problems modal clustering Morse theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	532
container_issue	4
container_start_page	518
container_title	Statistical science
container_volume	30
creator	Chacón, José E.
description	Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.
doi_str_mv	10.1214/15-STS526
format	Article
fullrecord	<record><control><sourceid>jstor_proje</sourceid><recordid>TN_cdi_projecteuclid_primary_oai_CULeuclid_euclid_ss_1449670856</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>24780818</jstor_id><sourcerecordid>24780818</sourcerecordid><originalsourceid>FETCH-LOGICAL-c370t-6ebc096e50b45d2d5839cd278810aab5cc095d79f7f49c7117002ed44b83b99f3</originalsourceid><addsrcrecordid>eNo9kMtqwzAQRUVpoelj0Q8oGLrqwq2elrwqifuE0BaSrIUsycGuY7mSvMjf18UhqwszhzPDBeAGwQeEEX1ELF2tVwxnJ2CGUSZSwSk7BTMoBEkpJvwcXITQQAhZhugMLObJt-uHVsXadclC6Z-td0Nnksr55NN1vfJqZ6OvdfJsu1DHfbpQwZqkaIcQra-77RU4q1Qb7PUhL8Hm9WVdvKfLr7ePYr5MNeEwppktNcwzy2BJmcGGCZJrg7kQCCpVMj1umeF5xSuaa44QhxBbQ2kpSJnnFbkET5O3966xOtpBt7WRva93yu-lU7UsNsvD9BAhSERpnnEoWDYa7o6G38GGKBs3-G58WiLOCMOEUjZS9xOlvQvB2-p4AkH5X7JETE4lj-ztxDYhOn8EMeUCCiTIH1nweFI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1753523445</pqid></control><display><type>article</type><title>A Population Background for Nonparametric Density-Based Clustering</title><source>Jstor Complete Legacy</source><source>JSTOR Mathematics and Statistics</source><source>Project Euclid Complete</source><source>EZB Electronic Journals Library</source><creator>Chacón, José E.</creator><creatorcontrib>Chacón, José E.</creatorcontrib><description>Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.</description><identifier>ISSN: 0883-4237</identifier><identifier>EISSN: 2168-8745</identifier><identifier>DOI: 10.1214/15-STS526</identifier><language>eng</language><publisher>Hayward: Institute of Mathematical Statistics</publisher><subject>Algorithms ; Clustering consistency ; distance in measure ; Hausdorff distance ; Mathematical functions ; Mathematical problems ; modal clustering ; Morse theory</subject><ispartof>Statistical science, 2015-11, Vol.30 (4), p.518-532</ispartof><rights>Copyright © 2015 Institute of Mathematical Statistics</rights><rights>Copyright Institute of Mathematical Statistics Nov 2015</rights><rights>Copyright 2015 Institute of Mathematical Statistics</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c370t-6ebc096e50b45d2d5839cd278810aab5cc095d79f7f49c7117002ed44b83b99f3</citedby><cites>FETCH-LOGICAL-c370t-6ebc096e50b45d2d5839cd278810aab5cc095d79f7f49c7117002ed44b83b99f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/24780818$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/24780818$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,776,780,799,828,881,921,27903,27904,57995,57999,58228,58232</link.rule.ids></links><search><creatorcontrib>Chacón, José E.</creatorcontrib><title>A Population Background for Nonparametric Density-Based Clustering</title><title>Statistical science</title><description>Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.</description><subject>Algorithms</subject><subject>Clustering consistency</subject><subject>distance in measure</subject><subject>Hausdorff distance</subject><subject>Mathematical functions</subject><subject>Mathematical problems</subject><subject>modal clustering</subject><subject>Morse theory</subject><issn>0883-4237</issn><issn>2168-8745</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNo9kMtqwzAQRUVpoelj0Q8oGLrqwq2elrwqifuE0BaSrIUsycGuY7mSvMjf18UhqwszhzPDBeAGwQeEEX1ELF2tVwxnJ2CGUSZSwSk7BTMoBEkpJvwcXITQQAhZhugMLObJt-uHVsXadclC6Z-td0Nnksr55NN1vfJqZ6OvdfJsu1DHfbpQwZqkaIcQra-77RU4q1Qb7PUhL8Hm9WVdvKfLr7ePYr5MNeEwppktNcwzy2BJmcGGCZJrg7kQCCpVMj1umeF5xSuaa44QhxBbQ2kpSJnnFbkET5O3966xOtpBt7WRva93yu-lU7UsNsvD9BAhSERpnnEoWDYa7o6G38GGKBs3-G58WiLOCMOEUjZS9xOlvQvB2-p4AkH5X7JETE4lj-ztxDYhOn8EMeUCCiTIH1nweFI</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Chacón, José E.</creator><general>Institute of Mathematical Statistics</general><general>The Institute of Mathematical Statistics</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20151101</creationdate><title>A Population Background for Nonparametric Density-Based Clustering</title><author>Chacón, José E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c370t-6ebc096e50b45d2d5839cd278810aab5cc095d79f7f49c7117002ed44b83b99f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Clustering consistency</topic><topic>distance in measure</topic><topic>Hausdorff distance</topic><topic>Mathematical functions</topic><topic>Mathematical problems</topic><topic>modal clustering</topic><topic>Morse theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chacón, José E.</creatorcontrib><collection>CrossRef</collection><jtitle>Statistical science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chacón, José E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Population Background for Nonparametric Density-Based Clustering</atitle><jtitle>Statistical science</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>30</volume><issue>4</issue><spage>518</spage><epage>532</epage><pages>518-532</pages><issn>0883-4237</issn><eissn>2168-8745</eissn><abstract>Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.</abstract><cop>Hayward</cop><pub>Institute of Mathematical Statistics</pub><doi>10.1214/15-STS526</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0883-4237
ispartof	Statistical science, 2015-11, Vol.30 (4), p.518-532
issn	0883-4237 2168-8745
language	eng
recordid	cdi_projecteuclid_primary_oai_CULeuclid_euclid_ss_1449670856
source	Jstor Complete Legacy; JSTOR Mathematics and Statistics; Project Euclid Complete; EZB Electronic Journals Library
subjects	Algorithms Clustering consistency distance in measure Hausdorff distance Mathematical functions Mathematical problems modal clustering Morse theory
title	A Population Background for Nonparametric Density-Based Clustering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T20%3A08%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proje&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Population%20Background%20for%20Nonparametric%20Density-Based%20Clustering&rft.jtitle=Statistical%20science&rft.au=Chac%C3%B3n,%20Jos%C3%A9%20E.&rft.date=2015-11-01&rft.volume=30&rft.issue=4&rft.spage=518&rft.epage=532&rft.pages=518-532&rft.issn=0883-4237&rft.eissn=2168-8745&rft_id=info:doi/10.1214/15-STS526&rft_dat=%3Cjstor_proje%3E24780818%3C/jstor_proje%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1753523445&rft_id=info:pmid/&rft_jstor_id=24780818&rfr_iscdi=true