Density-based multiscale data condensation

A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2002-06, Vol.24 (6), p.734-747
Hauptverfasser: Mitra, P., Murthy, C.A., Pal, S.K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 747
container_issue 6
container_start_page 734
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 24
creator Mitra, P.
Murthy, C.A.
Pal, S.K.
description A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.
doi_str_mv 10.1109/TPAMI.2002.1008381
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPAMI_2002_1008381</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1008381</ieee_id><sourcerecordid>2431097381</sourcerecordid><originalsourceid>FETCH-LOGICAL-c386t-95401de575c80d99cf05adb94ec061b2f2549c88ef532f55073d4f494d1e89fe3</originalsourceid><addsrcrecordid>eNqNkbtOwzAUhi0EEqXwArBUDCAhpRzfEnusyq1SEQxltlz7WEqVJiVOhr5906YDYkBMZzjffy76CLmmMKYU9OPic_I-GzMANqYAiit6QgZUc51wyfUpGQBNWaIUU-fkIsYVABUS-IA8PGEZ82abLG1EP1q3RZNHZwscedvYkatK3wG2yavykpwFW0S8OtYh-Xp5XkzfkvnH62w6mSeOq7RJtBRAPcpMOgVeaxdAWr_UAh2kdMkCk0I7pTBIzoKUkHEvgtDCU1Q6IB-S-37upq6-W4yNWXcnYVHYEqs2Gg2ZzlKasY68-5NkijEJ7B9gBipVSnTg7S9wVbV12b1rurbQGRzWsh5ydRVjjcFs6nxt662hYPY6zEGH2eswRx1d6KYP5Yj4I9B3dxcXhBw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>884497072</pqid></control><display><type>article</type><title>Density-based multiscale data condensation</title><source>IEEE Electronic Library (IEL)</source><creator>Mitra, P. ; Murthy, C.A. ; Pal, S.K.</creator><creatorcontrib>Mitra, P. ; Murthy, C.A. ; Pal, S.K.</creatorcontrib><description>A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>DOI: 10.1109/TPAMI.2002.1008381</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Clustering algorithms ; Condensing ; Data mining ; Data reduction ; Density ; Density measurement ; Error analysis ; Estimation error ; Intelligence ; Iterative algorithms ; Nearest neighbor searches ; Pattern recognition ; Sampling methods ; Studies ; Vector quantization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2002-06, Vol.24 (6), p.734-747</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2002</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c386t-95401de575c80d99cf05adb94ec061b2f2549c88ef532f55073d4f494d1e89fe3</citedby><cites>FETCH-LOGICAL-c386t-95401de575c80d99cf05adb94ec061b2f2549c88ef532f55073d4f494d1e89fe3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1008381$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1008381$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mitra, P.</creatorcontrib><creatorcontrib>Murthy, C.A.</creatorcontrib><creatorcontrib>Pal, S.K.</creatorcontrib><title>Density-based multiscale data condensation</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><description>A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.</description><subject>Algorithms</subject><subject>Clustering algorithms</subject><subject>Condensing</subject><subject>Data mining</subject><subject>Data reduction</subject><subject>Density</subject><subject>Density measurement</subject><subject>Error analysis</subject><subject>Estimation error</subject><subject>Intelligence</subject><subject>Iterative algorithms</subject><subject>Nearest neighbor searches</subject><subject>Pattern recognition</subject><subject>Sampling methods</subject><subject>Studies</subject><subject>Vector quantization</subject><issn>0162-8828</issn><issn>1939-3539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqNkbtOwzAUhi0EEqXwArBUDCAhpRzfEnusyq1SEQxltlz7WEqVJiVOhr5906YDYkBMZzjffy76CLmmMKYU9OPic_I-GzMANqYAiit6QgZUc51wyfUpGQBNWaIUU-fkIsYVABUS-IA8PGEZ82abLG1EP1q3RZNHZwscedvYkatK3wG2yavykpwFW0S8OtYh-Xp5XkzfkvnH62w6mSeOq7RJtBRAPcpMOgVeaxdAWr_UAh2kdMkCk0I7pTBIzoKUkHEvgtDCU1Q6IB-S-37upq6-W4yNWXcnYVHYEqs2Gg2ZzlKasY68-5NkijEJ7B9gBipVSnTg7S9wVbV12b1rurbQGRzWsh5ydRVjjcFs6nxt662hYPY6zEGH2eswRx1d6KYP5Yj4I9B3dxcXhBw</recordid><startdate>20020601</startdate><enddate>20020601</enddate><creator>Mitra, P.</creator><creator>Murthy, C.A.</creator><creator>Pal, S.K.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20020601</creationdate><title>Density-based multiscale data condensation</title><author>Mitra, P. ; Murthy, C.A. ; Pal, S.K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c386t-95401de575c80d99cf05adb94ec061b2f2549c88ef532f55073d4f494d1e89fe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Algorithms</topic><topic>Clustering algorithms</topic><topic>Condensing</topic><topic>Data mining</topic><topic>Data reduction</topic><topic>Density</topic><topic>Density measurement</topic><topic>Error analysis</topic><topic>Estimation error</topic><topic>Intelligence</topic><topic>Iterative algorithms</topic><topic>Nearest neighbor searches</topic><topic>Pattern recognition</topic><topic>Sampling methods</topic><topic>Studies</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mitra, P.</creatorcontrib><creatorcontrib>Murthy, C.A.</creatorcontrib><creatorcontrib>Pal, S.K.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mitra, P.</au><au>Murthy, C.A.</au><au>Pal, S.K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Density-based multiscale data condensation</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><date>2002-06-01</date><risdate>2002</risdate><volume>24</volume><issue>6</issue><spage>734</spage><epage>747</epage><pages>734-747</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><coden>ITPIDJ</coden><abstract>A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPAMI.2002.1008381</doi><tpages>14</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2002-06, Vol.24 (6), p.734-747
issn 0162-8828
1939-3539
language eng
recordid cdi_crossref_primary_10_1109_TPAMI_2002_1008381
source IEEE Electronic Library (IEL)
subjects Algorithms
Clustering algorithms
Condensing
Data mining
Data reduction
Density
Density measurement
Error analysis
Estimation error
Intelligence
Iterative algorithms
Nearest neighbor searches
Pattern recognition
Sampling methods
Studies
Vector quantization
title Density-based multiscale data condensation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T23%3A34%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Density-based%20multiscale%20data%20condensation&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Mitra,%20P.&rft.date=2002-06-01&rft.volume=24&rft.issue=6&rft.spage=734&rft.epage=747&rft.pages=734-747&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2002.1008381&rft_dat=%3Cproquest_RIE%3E2431097381%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=884497072&rft_id=info:pmid/&rft_ieee_id=1008381&rfr_iscdi=true