A topological data analysis based classifier

Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advances in data analysis and classification 2024-06, Vol.18 (2), p.493-538
Hauptverfasser: Kindelan, Rolando, Frías, José, Cerda, Mauricio, Hitschfeld, Nancy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 538
container_issue 2
container_start_page 493
container_title Advances in data analysis and classification
container_volume 18
creator Kindelan, Rolando
Frías, José
Cerda, Mauricio
Hitschfeld, Nancy
description Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.
doi_str_mv 10.1007/s11634-023-00548-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3069981488</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069981488</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069981488</pqid></control><display><type>article</type><title>A topological data analysis based classifier</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creator><creatorcontrib>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creatorcontrib><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><identifier>ISSN: 1862-5347</identifier><identifier>EISSN: 1862-5355</identifier><identifier>DOI: 10.1007/s11634-023-00548-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Chemistry and Earth Sciences ; Classification ; Classifiers ; Computer Science ; Data analysis ; Data Mining and Knowledge Discovery ; Datasets ; Economics ; Filtration ; Finance ; Health Sciences ; Homology ; Humanities ; Insurance ; Labels ; Law ; Machine learning ; Management ; Mathematics and Statistics ; Medicine ; Physics ; Regular Article ; Resampling ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences ; Support vector machines ; Topology</subject><ispartof>Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</cites><orcidid>0000-0002-4948-6051</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11634-023-00548-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11634-023-00548-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><title>A topological data analysis based classifier</title><title>Advances in data analysis and classification</title><addtitle>Adv Data Anal Classif</addtitle><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><subject>Chemistry and Earth Sciences</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computer Science</subject><subject>Data analysis</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Economics</subject><subject>Filtration</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Homology</subject><subject>Humanities</subject><subject>Insurance</subject><subject>Labels</subject><subject>Law</subject><subject>Machine learning</subject><subject>Management</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Physics</subject><subject>Regular Article</subject><subject>Resampling</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><subject>Support vector machines</subject><subject>Topology</subject><issn>1862-5347</issn><issn>1862-5355</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Kindelan, Rolando</creator><creator>Frías, José</creator><creator>Cerda, Mauricio</creator><creator>Hitschfeld, Nancy</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></search><sort><creationdate>20240601</creationdate><title>A topological data analysis based classifier</title><author>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Chemistry and Earth Sciences</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computer Science</topic><topic>Data analysis</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Economics</topic><topic>Filtration</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Homology</topic><topic>Humanities</topic><topic>Insurance</topic><topic>Labels</topic><topic>Law</topic><topic>Machine learning</topic><topic>Management</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Physics</topic><topic>Regular Article</topic><topic>Resampling</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><topic>Support vector machines</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><collection>CrossRef</collection><jtitle>Advances in data analysis and classification</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kindelan, Rolando</au><au>Frías, José</au><au>Cerda, Mauricio</au><au>Hitschfeld, Nancy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A topological data analysis based classifier</atitle><jtitle>Advances in data analysis and classification</jtitle><stitle>Adv Data Anal Classif</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>18</volume><issue>2</issue><spage>493</spage><epage>538</epage><pages>493-538</pages><issn>1862-5347</issn><eissn>1862-5355</eissn><abstract>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11634-023-00548-4</doi><tpages>46</tpages><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1862-5347
ispartof Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538
issn 1862-5347
1862-5355
language eng
recordid cdi_proquest_journals_3069981488
source SpringerLink Journals - AutoHoldings
subjects Chemistry and Earth Sciences
Classification
Classifiers
Computer Science
Data analysis
Data Mining and Knowledge Discovery
Datasets
Economics
Filtration
Finance
Health Sciences
Homology
Humanities
Insurance
Labels
Law
Machine learning
Management
Mathematics and Statistics
Medicine
Physics
Regular Article
Resampling
Statistical Theory and Methods
Statistics
Statistics for Business
Statistics for Engineering
Statistics for Life Sciences
Statistics for Social Sciences
Support vector machines
Topology
title A topological data analysis based classifier
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A42%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20topological%20data%20analysis%20based%20classifier&rft.jtitle=Advances%20in%20data%20analysis%20and%20classification&rft.au=Kindelan,%20Rolando&rft.date=2024-06-01&rft.volume=18&rft.issue=2&rft.spage=493&rft.epage=538&rft.pages=493-538&rft.issn=1862-5347&rft.eissn=1862-5355&rft_id=info:doi/10.1007/s11634-023-00548-4&rft_dat=%3Cproquest_cross%3E3069981488%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3069981488&rft_id=info:pmid/&rfr_iscdi=true