A topological data analysis based classifier

Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classif...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Advances in data analysis and classification 2024-06, Vol.18 (2), p.493-538
Hauptverfasser:	Kindelan, Rolando, Frías, José, Cerda, Mauricio, Hitschfeld, Nancy
Format:	Artikel
Sprache:	eng
Schlagworte:	Chemistry and Earth Sciences Classification Classifiers Computer Science Data analysis Data Mining and Knowledge Discovery Datasets Economics Filtration Finance Health Sciences Homology Humanities Insurance Labels Law Machine learning Management Mathematics and Statistics Medicine Physics Regular Article Resampling Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences Support vector machines Topology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	538
container_issue	2
container_start_page	493
container_title	Advances in data analysis and classification
container_volume	18
creator	Kindelan, Rolando Frías, José Cerda, Mauricio Hitschfeld, Nancy
description	Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.
doi_str_mv	10.1007/s11634-023-00548-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3069981488</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069981488</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069981488</pqid></control><display><type>article</type><title>A topological data analysis based classifier</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creator><creatorcontrib>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creatorcontrib><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><identifier>ISSN: 1862-5347</identifier><identifier>EISSN: 1862-5355</identifier><identifier>DOI: 10.1007/s11634-023-00548-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Chemistry and Earth Sciences ; Classification ; Classifiers ; Computer Science ; Data analysis ; Data Mining and Knowledge Discovery ; Datasets ; Economics ; Filtration ; Finance ; Health Sciences ; Homology ; Humanities ; Insurance ; Labels ; Law ; Machine learning ; Management ; Mathematics and Statistics ; Medicine ; Physics ; Regular Article ; Resampling ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences ; Support vector machines ; Topology</subject><ispartof>Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</cites><orcidid>0000-0002-4948-6051</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11634-023-00548-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11634-023-00548-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><title>A topological data analysis based classifier</title><title>Advances in data analysis and classification</title><addtitle>Adv Data Anal Classif</addtitle><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><subject>Chemistry and Earth Sciences</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computer Science</subject><subject>Data analysis</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Economics</subject><subject>Filtration</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Homology</subject><subject>Humanities</subject><subject>Insurance</subject><subject>Labels</subject><subject>Law</subject><subject>Machine learning</subject><subject>Management</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Physics</subject><subject>Regular Article</subject><subject>Resampling</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><subject>Support vector machines</subject><subject>Topology</subject><issn>1862-5347</issn><issn>1862-5355</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Kindelan, Rolando</creator><creator>Frías, José</creator><creator>Cerda, Mauricio</creator><creator>Hitschfeld, Nancy</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></search><sort><creationdate>20240601</creationdate><title>A topological data analysis based classifier</title><author>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Chemistry and Earth Sciences</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computer Science</topic><topic>Data analysis</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Economics</topic><topic>Filtration</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Homology</topic><topic>Humanities</topic><topic>Insurance</topic><topic>Labels</topic><topic>Law</topic><topic>Machine learning</topic><topic>Management</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Physics</topic><topic>Regular Article</topic><topic>Resampling</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><topic>Support vector machines</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><collection>CrossRef</collection><jtitle>Advances in data analysis and classification</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kindelan, Rolando</au><au>Frías, José</au><au>Cerda, Mauricio</au><au>Hitschfeld, Nancy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A topological data analysis based classifier</atitle><jtitle>Advances in data analysis and classification</jtitle><stitle>Adv Data Anal Classif</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>18</volume><issue>2</issue><spage>493</spage><epage>538</epage><pages>493-538</pages><issn>1862-5347</issn><eissn>1862-5355</eissn><abstract>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11634-023-00548-4</doi><tpages>46</tpages><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1862-5347
ispartof	Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538
issn	1862-5347 1862-5355
language	eng
recordid	cdi_proquest_journals_3069981488
source	SpringerLink Journals - AutoHoldings
subjects	Chemistry and Earth Sciences Classification Classifiers Computer Science Data analysis Data Mining and Knowledge Discovery Datasets Economics Filtration Finance Health Sciences Homology Humanities Insurance Labels Law Machine learning Management Mathematics and Statistics Medicine Physics Regular Article Resampling Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences Support vector machines Topology
title	A topological data analysis based classifier
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A42%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20topological%20data%20analysis%20based%20classifier&rft.jtitle=Advances%20in%20data%20analysis%20and%20classification&rft.au=Kindelan,%20Rolando&rft.date=2024-06-01&rft.volume=18&rft.issue=2&rft.spage=493&rft.epage=538&rft.pages=493-538&rft.issn=1862-5347&rft.eissn=1862-5355&rft_id=info:doi/10.1007/s11634-023-00548-4&rft_dat=%3Cproquest_cross%3E3069981488%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3069981488&rft_id=info:pmid/&rfr_iscdi=true