Random forest kernel for high-dimension low sample size classification

High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we propo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistics and computing 2024-02, Vol.34 (1), Article 9
Hauptverfasser: Cavalheiro, Lucca Portes, Bernard, Simon, Barddal, Jean Paul, Heutte, Laurent
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page
container_title Statistics and computing
container_volume 34
creator Cavalheiro, Lucca Portes
Bernard, Simon
Barddal, Jean Paul
Heutte, Laurent
description High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.
doi_str_mv 10.1007/s11222-023-10309-0
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_04253909v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879581430</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879581430</pqid></control><display><type>article</type><title>Random forest kernel for high-dimension low sample size classification</title><source>SpringerLink Journals</source><creator>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creator><creatorcontrib>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creatorcontrib><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><identifier>ISSN: 0960-3174</identifier><identifier>EISSN: 1573-1375</identifier><identifier>DOI: 10.1007/s11222-023-10309-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Kernels ; Machine Learning ; Medical imaging ; Original Paper ; Probability and Statistics in Computer Science ; Similarity ; Statistical analysis ; Statistical Theory and Methods ; Statistics ; Statistics and Computing/Statistics Programs</subject><ispartof>Statistics and computing, 2024-02, Vol.34 (1), Article 9</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>Attribution</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</citedby><cites>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</cites><orcidid>0000-0003-0200-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11222-023-10309-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11222-023-10309-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://hal.science/hal-04253909$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><title>Random forest kernel for high-dimension low sample size classification</title><title>Statistics and computing</title><addtitle>Stat Comput</addtitle><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Kernels</subject><subject>Machine Learning</subject><subject>Medical imaging</subject><subject>Original Paper</subject><subject>Probability and Statistics in Computer Science</subject><subject>Similarity</subject><subject>Statistical analysis</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics and Computing/Statistics Programs</subject><issn>0960-3174</issn><issn>1573-1375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Cavalheiro, Lucca Portes</creator><creator>Bernard, Simon</creator><creator>Barddal, Jean Paul</creator><creator>Heutte, Laurent</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag (Germany)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid></search><sort><creationdate>20240201</creationdate><title>Random forest kernel for high-dimension low sample size classification</title><author>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Kernels</topic><topic>Machine Learning</topic><topic>Medical imaging</topic><topic>Original Paper</topic><topic>Probability and Statistics in Computer Science</topic><topic>Similarity</topic><topic>Statistical analysis</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics and Computing/Statistics Programs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Statistics and computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cavalheiro, Lucca Portes</au><au>Bernard, Simon</au><au>Barddal, Jean Paul</au><au>Heutte, Laurent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random forest kernel for high-dimension low sample size classification</atitle><jtitle>Statistics and computing</jtitle><stitle>Stat Comput</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>34</volume><issue>1</issue><artnum>9</artnum><issn>0960-3174</issn><eissn>1573-1375</eissn><abstract>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11222-023-10309-0</doi><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0960-3174
ispartof Statistics and computing, 2024-02, Vol.34 (1), Article 9
issn 0960-3174
1573-1375
language eng
recordid cdi_hal_primary_oai_HAL_hal_04253909v1
source SpringerLink Journals
subjects Algorithms
Artificial Intelligence
Classification
Computer Science
Kernels
Machine Learning
Medical imaging
Original Paper
Probability and Statistics in Computer Science
Similarity
Statistical analysis
Statistical Theory and Methods
Statistics
Statistics and Computing/Statistics Programs
title Random forest kernel for high-dimension low sample size classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T05%3A53%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20forest%20kernel%20for%20high-dimension%20low%20sample%20size%20classification&rft.jtitle=Statistics%20and%20computing&rft.au=Cavalheiro,%20Lucca%20Portes&rft.date=2024-02-01&rft.volume=34&rft.issue=1&rft.artnum=9&rft.issn=0960-3174&rft.eissn=1573-1375&rft_id=info:doi/10.1007/s11222-023-10309-0&rft_dat=%3Cproquest_hal_p%3E2879581430%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879581430&rft_id=info:pmid/&rfr_iscdi=true