A topological data analysis based classifier
Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classif...
Gespeichert in:
Veröffentlicht in: | Advances in data analysis and classification 2024-06, Vol.18 (2), p.493-538 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 538 |
---|---|
container_issue | 2 |
container_start_page | 493 |
container_title | Advances in data analysis and classification |
container_volume | 18 |
creator | Kindelan, Rolando Frías, José Cerda, Mauricio Hitschfeld, Nancy |
description | Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets. |
doi_str_mv | 10.1007/s11634-023-00548-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3069981488</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069981488</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069981488</pqid></control><display><type>article</type><title>A topological data analysis based classifier</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creator><creatorcontrib>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</creatorcontrib><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><identifier>ISSN: 1862-5347</identifier><identifier>EISSN: 1862-5355</identifier><identifier>DOI: 10.1007/s11634-023-00548-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Chemistry and Earth Sciences ; Classification ; Classifiers ; Computer Science ; Data analysis ; Data Mining and Knowledge Discovery ; Datasets ; Economics ; Filtration ; Finance ; Health Sciences ; Homology ; Humanities ; Insurance ; Labels ; Law ; Machine learning ; Management ; Mathematics and Statistics ; Medicine ; Physics ; Regular Article ; Resampling ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences ; Support vector machines ; Topology</subject><ispartof>Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</cites><orcidid>0000-0002-4948-6051</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11634-023-00548-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11634-023-00548-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><title>A topological data analysis based classifier</title><title>Advances in data analysis and classification</title><addtitle>Adv Data Anal Classif</addtitle><description>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</description><subject>Chemistry and Earth Sciences</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computer Science</subject><subject>Data analysis</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Economics</subject><subject>Filtration</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Homology</subject><subject>Humanities</subject><subject>Insurance</subject><subject>Labels</subject><subject>Law</subject><subject>Machine learning</subject><subject>Management</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Physics</subject><subject>Regular Article</subject><subject>Resampling</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><subject>Support vector machines</subject><subject>Topology</subject><issn>1862-5347</issn><issn>1862-5355</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwB5gisWK4s32JM1YVFKRKLDBbjuNUqUJTfOnQf08gCDame8P7nk6fENcIdwhQ3DNiro0EpSUAGSvNiZihzZUkTXT6m01xLi6YtwA5GKCZuF1kQ7_vu37TBt9ltR985ne-O3LLWeU51lnoPHPbtDFdirPGdxyvfu5cvD0-vC6f5Ppl9bxcrGVQBQzSoDKKTB2gtgFtrSuf57pQVGlVAvkCSYVQBQplgApLU8eSEH1DBZE1pOfiZtrdp_7jEHlw2_6Qxq_YacjL0qKxdmypqRVSz5xi4_apfffp6BDclxU3WXGjFfdtxZkR0hPEY3m3ielv-h_qE1BtYp8</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Kindelan, Rolando</creator><creator>Frías, José</creator><creator>Cerda, Mauricio</creator><creator>Hitschfeld, Nancy</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></search><sort><creationdate>20240601</creationdate><title>A topological data analysis based classifier</title><author>Kindelan, Rolando ; Frías, José ; Cerda, Mauricio ; Hitschfeld, Nancy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-4124254dc0d8c18d3ba663725b32905a7152ccbc5c9c0b194de9511af57558453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Chemistry and Earth Sciences</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computer Science</topic><topic>Data analysis</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Economics</topic><topic>Filtration</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Homology</topic><topic>Humanities</topic><topic>Insurance</topic><topic>Labels</topic><topic>Law</topic><topic>Machine learning</topic><topic>Management</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Physics</topic><topic>Regular Article</topic><topic>Resampling</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><topic>Support vector machines</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kindelan, Rolando</creatorcontrib><creatorcontrib>Frías, José</creatorcontrib><creatorcontrib>Cerda, Mauricio</creatorcontrib><creatorcontrib>Hitschfeld, Nancy</creatorcontrib><collection>CrossRef</collection><jtitle>Advances in data analysis and classification</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kindelan, Rolando</au><au>Frías, José</au><au>Cerda, Mauricio</au><au>Hitschfeld, Nancy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A topological data analysis based classifier</atitle><jtitle>Advances in data analysis and classification</jtitle><stitle>Adv Data Anal Classif</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>18</volume><issue>2</issue><spage>493</spage><epage>538</epage><pages>493-538</pages><issn>1862-5347</issn><eissn>1862-5355</eissn><abstract>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11634-023-00548-4</doi><tpages>46</tpages><orcidid>https://orcid.org/0000-0002-4948-6051</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1862-5347 |
ispartof | Advances in data analysis and classification, 2024-06, Vol.18 (2), p.493-538 |
issn | 1862-5347 1862-5355 |
language | eng |
recordid | cdi_proquest_journals_3069981488 |
source | SpringerLink Journals - AutoHoldings |
subjects | Chemistry and Earth Sciences Classification Classifiers Computer Science Data analysis Data Mining and Knowledge Discovery Datasets Economics Filtration Finance Health Sciences Homology Humanities Insurance Labels Law Machine learning Management Mathematics and Statistics Medicine Physics Regular Article Resampling Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences Support vector machines Topology |
title | A topological data analysis based classifier |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A42%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20topological%20data%20analysis%20based%20classifier&rft.jtitle=Advances%20in%20data%20analysis%20and%20classification&rft.au=Kindelan,%20Rolando&rft.date=2024-06-01&rft.volume=18&rft.issue=2&rft.spage=493&rft.epage=538&rft.pages=493-538&rft.issn=1862-5347&rft.eissn=1862-5355&rft_id=info:doi/10.1007/s11634-023-00548-4&rft_dat=%3Cproquest_cross%3E3069981488%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3069981488&rft_id=info:pmid/&rfr_iscdi=true |