Towards a Simple Clustering Criterion Based on Minimum Length Encoding

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ludl, Marcus-Christopher, Widmer, Gerhard
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 270
container_issue
container_start_page 258
container_title
container_volume 2430
creator Ludl, Marcus-Christopher
Widmer, Gerhard
description We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.
doi_str_mv 10.1007/3-540-36755-1_22
format Book Chapter
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_14655655</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3073102_28_272</sourcerecordid><originalsourceid>FETCH-LOGICAL-p310t-f42286c3844ac4cbd75fbeab2c4a016567bb0d8cd82a883a522ad9871d2b2a563</originalsourceid><addsrcrecordid>eNo9kE1PwzAMhsOnmGB3jr1w7EjspEmPMPElDXEAzpGbZlDo2pJ0Qvx7UkBYlmzZ72vJD2Ongi8E5_occyV5joVWKhcWYIfNS20wDX9mfJfNRCFEjijLvf-dnNblPptx5JCXWuIhm5WooBRaiSM2j_GNp0CQyogZu37qPynUMaPssdkMrc-W7TaOPjTdS7YMzdT1XXZJ0ddZau6brtlsN9nKdy_ja3bVub5O0hN2sKY2-vlfPWbP11dPy9t89XBzt7xY5QMKPuZrCWAKh0ZKctJVtVbrylMFThIXhSp0VfHauNoAGYOkAKgujRY1VECqwGN29nt3oOioXQfqXBPtEJoNhS8rZKFUyqRb_OriMH3ig636_j1awe0E16JNrOwPSDvBTQb8Oxz6j62Po_WTw_luDNS6VxoSiWiR6_QIWDAWNOA34351_Q</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3073102_28_272</pqid></control><display><type>book_chapter</type><title>Towards a Simple Clustering Criterion Based on Minimum Length Encoding</title><source>Springer Books</source><creator>Ludl, Marcus-Christopher ; Widmer, Gerhard</creator><contributor>Toivonen, Hannu ; Elomaa, Tapio ; Mannila, Heikki ; Mannila, Heikki ; Toivonen, Hannu ; Elomaa, Tapio</contributor><creatorcontrib>Ludl, Marcus-Christopher ; Widmer, Gerhard ; Toivonen, Hannu ; Elomaa, Tapio ; Mannila, Heikki ; Mannila, Heikki ; Toivonen, Hannu ; Elomaa, Tapio</creatorcontrib><description>We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540440369</identifier><identifier>ISBN: 3540440364</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540367550</identifier><identifier>EISBN: 3540367551</identifier><identifier>DOI: 10.1007/3-540-36755-1_22</identifier><identifier>OCLC: 935291751</identifier><identifier>LCCallNum: Q334-342</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Candidate Cluster ; Computer science; control theory; systems ; Exact sciences and technology ; Instance Space ; Learning and adaptive systems ; Message Length ; Minimum Description Length ; Synthetic Dataset</subject><ispartof>Lecture notes in computer science, 2002, Vol.2430, p.258-270</ispartof><rights>Springer-Verlag Berlin Heidelberg 2002</rights><rights>2003 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3073102-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-36755-1_22$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-36755-1_22$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,775,776,780,785,786,789,4036,4037,27902,38232,41418,42487</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=14655655$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Toivonen, Hannu</contributor><contributor>Elomaa, Tapio</contributor><contributor>Mannila, Heikki</contributor><contributor>Mannila, Heikki</contributor><contributor>Toivonen, Hannu</contributor><contributor>Elomaa, Tapio</contributor><creatorcontrib>Ludl, Marcus-Christopher</creatorcontrib><creatorcontrib>Widmer, Gerhard</creatorcontrib><title>Towards a Simple Clustering Criterion Based on Minimum Length Encoding</title><title>Lecture notes in computer science</title><description>We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Candidate Cluster</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Instance Space</subject><subject>Learning and adaptive systems</subject><subject>Message Length</subject><subject>Minimum Description Length</subject><subject>Synthetic Dataset</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540440369</isbn><isbn>3540440364</isbn><isbn>9783540367550</isbn><isbn>3540367551</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2002</creationdate><recordtype>book_chapter</recordtype><recordid>eNo9kE1PwzAMhsOnmGB3jr1w7EjspEmPMPElDXEAzpGbZlDo2pJ0Qvx7UkBYlmzZ72vJD2Ongi8E5_occyV5joVWKhcWYIfNS20wDX9mfJfNRCFEjijLvf-dnNblPptx5JCXWuIhm5WooBRaiSM2j_GNp0CQyogZu37qPynUMaPssdkMrc-W7TaOPjTdS7YMzdT1XXZJ0ddZau6brtlsN9nKdy_ja3bVub5O0hN2sKY2-vlfPWbP11dPy9t89XBzt7xY5QMKPuZrCWAKh0ZKctJVtVbrylMFThIXhSp0VfHauNoAGYOkAKgujRY1VECqwGN29nt3oOioXQfqXBPtEJoNhS8rZKFUyqRb_OriMH3ig636_j1awe0E16JNrOwPSDvBTQb8Oxz6j62Po_WTw_luDNS6VxoSiWiR6_QIWDAWNOA34351_Q</recordid><startdate>2002</startdate><enddate>2002</enddate><creator>Ludl, Marcus-Christopher</creator><creator>Widmer, Gerhard</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2002</creationdate><title>Towards a Simple Clustering Criterion Based on Minimum Length Encoding</title><author>Ludl, Marcus-Christopher ; Widmer, Gerhard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p310t-f42286c3844ac4cbd75fbeab2c4a016567bb0d8cd82a883a522ad9871d2b2a563</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Candidate Cluster</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Instance Space</topic><topic>Learning and adaptive systems</topic><topic>Message Length</topic><topic>Minimum Description Length</topic><topic>Synthetic Dataset</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ludl, Marcus-Christopher</creatorcontrib><creatorcontrib>Widmer, Gerhard</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ludl, Marcus-Christopher</au><au>Widmer, Gerhard</au><au>Toivonen, Hannu</au><au>Elomaa, Tapio</au><au>Mannila, Heikki</au><au>Mannila, Heikki</au><au>Toivonen, Hannu</au><au>Elomaa, Tapio</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Towards a Simple Clustering Criterion Based on Minimum Length Encoding</atitle><btitle>Lecture notes in computer science</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2002</date><risdate>2002</risdate><volume>2430</volume><spage>258</spage><epage>270</epage><pages>258-270</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540440369</isbn><isbn>3540440364</isbn><eisbn>9783540367550</eisbn><eisbn>3540367551</eisbn><abstract>We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-36755-1_22</doi><oclcid>935291751</oclcid><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0302-9743
ispartof Lecture notes in computer science, 2002, Vol.2430, p.258-270
issn 0302-9743
1611-3349
language eng
recordid cdi_pascalfrancis_primary_14655655
source Springer Books
subjects Applied sciences
Artificial intelligence
Candidate Cluster
Computer science
control theory
systems
Exact sciences and technology
Instance Space
Learning and adaptive systems
Message Length
Minimum Description Length
Synthetic Dataset
title Towards a Simple Clustering Criterion Based on Minimum Length Encoding
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T09%3A21%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Towards%20a%20Simple%20Clustering%20Criterion%20Based%20on%20Minimum%20Length%20Encoding&rft.btitle=Lecture%20notes%20in%20computer%20science&rft.au=Ludl,%20Marcus-Christopher&rft.date=2002&rft.volume=2430&rft.spage=258&rft.epage=270&rft.pages=258-270&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540440369&rft.isbn_list=3540440364&rft_id=info:doi/10.1007/3-540-36755-1_22&rft_dat=%3Cproquest_pasca%3EEBC3073102_28_272%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540367550&rft.eisbn_list=3540367551&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3073102_28_272&rft_id=info:pmid/&rfr_iscdi=true