A mixture model approach to spectral clustering and application to textual data

The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistical methods & applications 2022-12, Vol.31 (5), p.1071-1097
Hauptverfasser: Di Nuzzo, Cinzia, Ingrassia, Salvatore
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1097
container_issue 5
container_start_page 1071
container_title Statistical methods & applications
container_volume 31
creator Di Nuzzo, Cinzia
Ingrassia, Salvatore
description The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.
doi_str_mv 10.1007/s10260-022-00635-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2743528100</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2743528100</sourcerecordid><originalsourceid>FETCH-LOGICAL-c352t-fbf1d17f981b3a5841724627ef8daf960b1aea45d1df7cdaa57d1780d4bdfe743</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Bz9F8tt3jsvgFC3tR8Bam-dAu3bYmKei_N90K3jxlIM-8M_MgdM3oLaO0vIuM8oISyjmhtBCKyBO0YAUTZFWxt9NjXRGuGD1HFzHuKRVCSLFAuzU-NF9pDA4feutaDMMQejAfOPU4Ds6kAC027RiTC033jqGzE9M2BlLTdxOWXA7IlIUEl-jMQxvd1e-7RK8P9y-bJ7LdPT5v1ltihOKJ-Nozy0qft6sFqEqyksuCl85XFvyqoDUDB1JZZn1pLIAqM15RK2vrXSnFEt3MuXnbz9HFpPf9GLo8UvP8rXiVvWSKz5QJfYzBeT2E5gDhWzOqJ3F6FqezOH0Up6doMTfFYbrYhb_of7p-AG1scaE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2743528100</pqid></control><display><type>article</type><title>A mixture model approach to spectral clustering and application to textual data</title><source>EBSCOhost Business Source Complete</source><source>SpringerLink Journals - AutoHoldings</source><creator>Di Nuzzo, Cinzia ; Ingrassia, Salvatore</creator><creatorcontrib>Di Nuzzo, Cinzia ; Ingrassia, Salvatore</creatorcontrib><description>The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.</description><identifier>ISSN: 1618-2510</identifier><identifier>EISSN: 1613-981X</identifier><identifier>DOI: 10.1007/s10260-022-00635-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Chemistry and Earth Sciences ; Cluster analysis ; Clustering ; Computer Science ; Datasets ; Economics ; Finance ; Health Sciences ; Humanities ; Insurance ; Kernel functions ; Law ; Management ; Mathematics and Statistics ; Medicine ; Original Paper ; Physics ; Probabilistic models ; Robustness (mathematics) ; Spectral methods ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences</subject><ispartof>Statistical methods &amp; applications, 2022-12, Vol.31 (5), p.1071-1097</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c352t-fbf1d17f981b3a5841724627ef8daf960b1aea45d1df7cdaa57d1780d4bdfe743</citedby><cites>FETCH-LOGICAL-c352t-fbf1d17f981b3a5841724627ef8daf960b1aea45d1df7cdaa57d1780d4bdfe743</cites><orcidid>0000-0003-2052-4226 ; 0000-0003-2062-8930</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10260-022-00635-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10260-022-00635-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Di Nuzzo, Cinzia</creatorcontrib><creatorcontrib>Ingrassia, Salvatore</creatorcontrib><title>A mixture model approach to spectral clustering and application to textual data</title><title>Statistical methods &amp; applications</title><addtitle>Stat Methods Appl</addtitle><description>The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.</description><subject>Algorithms</subject><subject>Chemistry and Earth Sciences</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Economics</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Humanities</subject><subject>Insurance</subject><subject>Kernel functions</subject><subject>Law</subject><subject>Management</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Original Paper</subject><subject>Physics</subject><subject>Probabilistic models</subject><subject>Robustness (mathematics)</subject><subject>Spectral methods</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><issn>1618-2510</issn><issn>1613-981X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AU8Bz9F8tt3jsvgFC3tR8Bam-dAu3bYmKei_N90K3jxlIM-8M_MgdM3oLaO0vIuM8oISyjmhtBCKyBO0YAUTZFWxt9NjXRGuGD1HFzHuKRVCSLFAuzU-NF9pDA4feutaDMMQejAfOPU4Ds6kAC027RiTC033jqGzE9M2BlLTdxOWXA7IlIUEl-jMQxvd1e-7RK8P9y-bJ7LdPT5v1ltihOKJ-Nozy0qft6sFqEqyksuCl85XFvyqoDUDB1JZZn1pLIAqM15RK2vrXSnFEt3MuXnbz9HFpPf9GLo8UvP8rXiVvWSKz5QJfYzBeT2E5gDhWzOqJ3F6FqezOH0Up6doMTfFYbrYhb_of7p-AG1scaE</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Di Nuzzo, Cinzia</creator><creator>Ingrassia, Salvatore</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2052-4226</orcidid><orcidid>https://orcid.org/0000-0003-2062-8930</orcidid></search><sort><creationdate>20221201</creationdate><title>A mixture model approach to spectral clustering and application to textual data</title><author>Di Nuzzo, Cinzia ; Ingrassia, Salvatore</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c352t-fbf1d17f981b3a5841724627ef8daf960b1aea45d1df7cdaa57d1780d4bdfe743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Chemistry and Earth Sciences</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Economics</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Humanities</topic><topic>Insurance</topic><topic>Kernel functions</topic><topic>Law</topic><topic>Management</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Original Paper</topic><topic>Physics</topic><topic>Probabilistic models</topic><topic>Robustness (mathematics)</topic><topic>Spectral methods</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Di Nuzzo, Cinzia</creatorcontrib><creatorcontrib>Ingrassia, Salvatore</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Statistical methods &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Di Nuzzo, Cinzia</au><au>Ingrassia, Salvatore</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A mixture model approach to spectral clustering and application to textual data</atitle><jtitle>Statistical methods &amp; applications</jtitle><stitle>Stat Methods Appl</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>31</volume><issue>5</issue><spage>1071</spage><epage>1097</epage><pages>1071-1097</pages><issn>1618-2510</issn><eissn>1613-981X</eissn><abstract>The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10260-022-00635-4</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0003-2052-4226</orcidid><orcidid>https://orcid.org/0000-0003-2062-8930</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1618-2510
ispartof Statistical methods & applications, 2022-12, Vol.31 (5), p.1071-1097
issn 1618-2510
1613-981X
language eng
recordid cdi_proquest_journals_2743528100
source EBSCOhost Business Source Complete; SpringerLink Journals - AutoHoldings
subjects Algorithms
Chemistry and Earth Sciences
Cluster analysis
Clustering
Computer Science
Datasets
Economics
Finance
Health Sciences
Humanities
Insurance
Kernel functions
Law
Management
Mathematics and Statistics
Medicine
Original Paper
Physics
Probabilistic models
Robustness (mathematics)
Spectral methods
Statistical Theory and Methods
Statistics
Statistics for Business
Statistics for Engineering
Statistics for Life Sciences
Statistics for Social Sciences
title A mixture model approach to spectral clustering and application to textual data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T01%3A28%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20mixture%20model%20approach%20to%20spectral%20clustering%20and%20application%20to%20textual%20data&rft.jtitle=Statistical%20methods%20&%20applications&rft.au=Di%20Nuzzo,%20Cinzia&rft.date=2022-12-01&rft.volume=31&rft.issue=5&rft.spage=1071&rft.epage=1097&rft.pages=1071-1097&rft.issn=1618-2510&rft.eissn=1613-981X&rft_id=info:doi/10.1007/s10260-022-00635-4&rft_dat=%3Cproquest_cross%3E2743528100%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2743528100&rft_id=info:pmid/&rfr_iscdi=true