Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes

Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mix...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Soemitro, Dylan, Jeova Farias Sales Rocha Neto
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Data mining Nodes Similarity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Soemitro, Dylan Jeova Farias Sales Rocha Neto
description	Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2955961408</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2955961408</sourcerecordid><originalsourceid>FETCH-proquest_journals_29559614083</originalsourceid><addsrcrecordid>eNqNi7EOgjAUABsTE4nyDy9xJqktRZgRdVAX3UkDDywhtLbF4N_L4Ac43XB3CxIwzndRGjO2IqFzHaWUJXsmBA_I5W6w8lb2kPej82jV0IJuIJceW21VNRs51HBVE9aR_xiEg_QS3kpCMc0jnKw0T7jpGt2GLBvZOwx_XJPtsXjk58hY_RrR-bLTox1mVbJMiCzZxTTl_1Vf_rg8xg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955961408</pqid></control><display><type>article</type><title>Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes</title><source>Free E- Journals</source><creator>Soemitro, Dylan ; Jeova Farias Sales Rocha Neto</creator><creatorcontrib>Soemitro, Dylan ; Jeova Farias Sales Rocha Neto</creatorcontrib><description>Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Clustering ; Data mining ; Nodes ; Similarity</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Soemitro, Dylan</creatorcontrib><creatorcontrib>Jeova Farias Sales Rocha Neto</creatorcontrib><title>Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes</title><title>arXiv.org</title><description>Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Data mining</subject><subject>Nodes</subject><subject>Similarity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7EOgjAUABsTE4nyDy9xJqktRZgRdVAX3UkDDywhtLbF4N_L4Ac43XB3CxIwzndRGjO2IqFzHaWUJXsmBA_I5W6w8lb2kPej82jV0IJuIJceW21VNRs51HBVE9aR_xiEg_QS3kpCMc0jnKw0T7jpGt2GLBvZOwx_XJPtsXjk58hY_RrR-bLTox1mVbJMiCzZxTTl_1Vf_rg8xg</recordid><startdate>20240308</startdate><enddate>20240308</enddate><creator>Soemitro, Dylan</creator><creator>Jeova Farias Sales Rocha Neto</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240308</creationdate><title>Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes</title><author>Soemitro, Dylan ; Jeova Farias Sales Rocha Neto</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29559614083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Data mining</topic><topic>Nodes</topic><topic>Similarity</topic><toplevel>online_resources</toplevel><creatorcontrib>Soemitro, Dylan</creatorcontrib><creatorcontrib>Jeova Farias Sales Rocha Neto</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Soemitro, Dylan</au><au>Jeova Farias Sales Rocha Neto</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes</atitle><jtitle>arXiv.org</jtitle><date>2024-03-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2955961408
source	Free E- Journals
subjects	Algorithms Clustering Data mining Nodes Similarity
title	Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T12%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Spectral%20Clustering%20of%20Categorical%20and%20Mixed-type%20Data%20via%20Extra%20Graph%20Nodes&rft.jtitle=arXiv.org&rft.au=Soemitro,%20Dylan&rft.date=2024-03-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2955961408%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2955961408&rft_id=info:pmid/&rfr_iscdi=true