Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data

Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Advances in data analysis and classification 2021-06, Vol.15 (2), p.407-436
Hauptverfasser:	Umbleja, Kadri, Ichino, Manabu, Yaguchi, Hiroyuki
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Chemistry and Earth Sciences Clustering Complexity Computer Science Data mining Data Mining and Knowledge Discovery Economics Experimentation Finance Health Sciences Histograms Humanities Insurance Law Management Mathematics and Statistics Medicine Physics Quantiles Regular Article Similarity Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	436
container_issue	2
container_start_page	407
container_title	Advances in data analysis and classification
container_volume	15
creator	Umbleja, Kadri Ichino, Manabu Yaguchi, Hiroyuki
description	Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.
doi_str_mv	10.1007/s11634-020-00411-w
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2535303046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2535303046</sourcerecordid><originalsourceid>FETCH-LOGICAL-c367t-a9817169e182ef4b04d1deb1d9ce3d14d99070af494f1782ca89cab35066414a3</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AVcB19WbJn0tZfAFA250HdLk1snQaTpJ6jD_3taK7lzdw-Wcw-Ej5JrBLQMo7gJjORcJpJAACMaSwwlZsDJPk4xn2emvFsU5uQhhC5CDgGxBPp8teuX1xmrVUu06jX0cJtkOIaK33QetVUBDXUf3g-qibZHuMG6coY3z1Bocf81xMu6s9i5o11tNDUZl20BtR40N0dt6iNZ1Y7NRUV2Ss0a1Aa9-7pK8Pz68rZ6T9evTy-p-nWieFzFRVckKllfIyhQbUYMwzGDNTKWRGyZMVUEBqhGVaFhRplqVlVY1zyDPBROKL8nN3Nt7tx8wRLl1gx9XBJmOZDhwEPnoSmfXND94bGTv7U75o2QgJ75y5itHvvKbrzyMIT6HQj9RQv9X_U_qCwVmgMM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2535303046</pqid></control><display><type>article</type><title>Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data</title><source>SpringerLink Journals</source><creator>Umbleja, Kadri ; Ichino, Manabu ; Yaguchi, Hiroyuki</creator><creatorcontrib>Umbleja, Kadri ; Ichino, Manabu ; Yaguchi, Hiroyuki</creatorcontrib><description>Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.</description><identifier>ISSN: 1862-5347</identifier><identifier>EISSN: 1862-5355</identifier><identifier>DOI: 10.1007/s11634-020-00411-w</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Chemistry and Earth Sciences ; Clustering ; Complexity ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Economics ; Experimentation ; Finance ; Health Sciences ; Histograms ; Humanities ; Insurance ; Law ; Management ; Mathematics and Statistics ; Medicine ; Physics ; Quantiles ; Regular Article ; Similarity ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences</subject><ispartof>Advances in data analysis and classification, 2021-06, Vol.15 (2), p.407-436</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c367t-a9817169e182ef4b04d1deb1d9ce3d14d99070af494f1782ca89cab35066414a3</citedby><cites>FETCH-LOGICAL-c367t-a9817169e182ef4b04d1deb1d9ce3d14d99070af494f1782ca89cab35066414a3</cites><orcidid>0000-0001-8264-2856</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11634-020-00411-w$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11634-020-00411-w$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Umbleja, Kadri</creatorcontrib><creatorcontrib>Ichino, Manabu</creatorcontrib><creatorcontrib>Yaguchi, Hiroyuki</creatorcontrib><title>Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data</title><title>Advances in data analysis and classification</title><addtitle>Adv Data Anal Classif</addtitle><description>Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.</description><subject>Algorithms</subject><subject>Chemistry and Earth Sciences</subject><subject>Clustering</subject><subject>Complexity</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Economics</subject><subject>Experimentation</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Histograms</subject><subject>Humanities</subject><subject>Insurance</subject><subject>Law</subject><subject>Management</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Physics</subject><subject>Quantiles</subject><subject>Regular Article</subject><subject>Similarity</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><issn>1862-5347</issn><issn>1862-5355</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AVcB19WbJn0tZfAFA250HdLk1snQaTpJ6jD_3taK7lzdw-Wcw-Ej5JrBLQMo7gJjORcJpJAACMaSwwlZsDJPk4xn2emvFsU5uQhhC5CDgGxBPp8teuX1xmrVUu06jX0cJtkOIaK33QetVUBDXUf3g-qibZHuMG6coY3z1Bocf81xMu6s9i5o11tNDUZl20BtR40N0dt6iNZ1Y7NRUV2Ss0a1Aa9-7pK8Pz68rZ6T9evTy-p-nWieFzFRVckKllfIyhQbUYMwzGDNTKWRGyZMVUEBqhGVaFhRplqVlVY1zyDPBROKL8nN3Nt7tx8wRLl1gx9XBJmOZDhwEPnoSmfXND94bGTv7U75o2QgJ75y5itHvvKbrzyMIT6HQj9RQv9X_U_qCwVmgMM</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Umbleja, Kadri</creator><creator>Ichino, Manabu</creator><creator>Yaguchi, Hiroyuki</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8264-2856</orcidid></search><sort><creationdate>20210601</creationdate><title>Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data</title><author>Umbleja, Kadri ; Ichino, Manabu ; Yaguchi, Hiroyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c367t-a9817169e182ef4b04d1deb1d9ce3d14d99070af494f1782ca89cab35066414a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Chemistry and Earth Sciences</topic><topic>Clustering</topic><topic>Complexity</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Economics</topic><topic>Experimentation</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Histograms</topic><topic>Humanities</topic><topic>Insurance</topic><topic>Law</topic><topic>Management</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Physics</topic><topic>Quantiles</topic><topic>Regular Article</topic><topic>Similarity</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Umbleja, Kadri</creatorcontrib><creatorcontrib>Ichino, Manabu</creatorcontrib><creatorcontrib>Yaguchi, Hiroyuki</creatorcontrib><collection>CrossRef</collection><jtitle>Advances in data analysis and classification</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Umbleja, Kadri</au><au>Ichino, Manabu</au><au>Yaguchi, Hiroyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data</atitle><jtitle>Advances in data analysis and classification</jtitle><stitle>Adv Data Anal Classif</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>15</volume><issue>2</issue><spage>407</spage><epage>436</epage><pages>407-436</pages><issn>1862-5347</issn><eissn>1862-5355</eissn><abstract>Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11634-020-00411-w</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0001-8264-2856</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1862-5347
ispartof	Advances in data analysis and classification, 2021-06, Vol.15 (2), p.407-436
issn	1862-5347 1862-5355
language	eng
recordid	cdi_proquest_journals_2535303046
source	SpringerLink Journals
subjects	Algorithms Chemistry and Earth Sciences Clustering Complexity Computer Science Data mining Data Mining and Knowledge Discovery Economics Experimentation Finance Health Sciences Histograms Humanities Insurance Law Management Mathematics and Statistics Medicine Physics Quantiles Regular Article Similarity Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences
title	Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T21%3A00%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20conceptual%20clustering%20based%20on%20quantile%20method%20for%20identifying%20microscopic%20details%20in%20distributional%20data&rft.jtitle=Advances%20in%20data%20analysis%20and%20classification&rft.au=Umbleja,%20Kadri&rft.date=2021-06-01&rft.volume=15&rft.issue=2&rft.spage=407&rft.epage=436&rft.pages=407-436&rft.issn=1862-5347&rft.eissn=1862-5355&rft_id=info:doi/10.1007/s11634-020-00411-w&rft_dat=%3Cproquest_cross%3E2535303046%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2535303046&rft_id=info:pmid/&rfr_iscdi=true