Random forest kernel for high-dimension low sample size classification
High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we propo...
Gespeichert in:
Veröffentlicht in: | Statistics and computing 2024-02, Vol.34 (1), Article 9 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | |
container_title | Statistics and computing |
container_volume | 34 |
creator | Cavalheiro, Lucca Portes Bernard, Simon Barddal, Jean Paul Heutte, Laurent |
description | High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems. |
doi_str_mv | 10.1007/s11222-023-10309-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_04253909v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879581430</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879581430</pqid></control><display><type>article</type><title>Random forest kernel for high-dimension low sample size classification</title><source>SpringerLink Journals</source><creator>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creator><creatorcontrib>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creatorcontrib><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><identifier>ISSN: 0960-3174</identifier><identifier>EISSN: 1573-1375</identifier><identifier>DOI: 10.1007/s11222-023-10309-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Kernels ; Machine Learning ; Medical imaging ; Original Paper ; Probability and Statistics in Computer Science ; Similarity ; Statistical analysis ; Statistical Theory and Methods ; Statistics ; Statistics and Computing/Statistics Programs</subject><ispartof>Statistics and computing, 2024-02, Vol.34 (1), Article 9</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>Attribution</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</citedby><cites>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</cites><orcidid>0000-0003-0200-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11222-023-10309-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11222-023-10309-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://hal.science/hal-04253909$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><title>Random forest kernel for high-dimension low sample size classification</title><title>Statistics and computing</title><addtitle>Stat Comput</addtitle><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Kernels</subject><subject>Machine Learning</subject><subject>Medical imaging</subject><subject>Original Paper</subject><subject>Probability and Statistics in Computer Science</subject><subject>Similarity</subject><subject>Statistical analysis</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics and Computing/Statistics Programs</subject><issn>0960-3174</issn><issn>1573-1375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Cavalheiro, Lucca Portes</creator><creator>Bernard, Simon</creator><creator>Barddal, Jean Paul</creator><creator>Heutte, Laurent</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag (Germany)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid></search><sort><creationdate>20240201</creationdate><title>Random forest kernel for high-dimension low sample size classification</title><author>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Kernels</topic><topic>Machine Learning</topic><topic>Medical imaging</topic><topic>Original Paper</topic><topic>Probability and Statistics in Computer Science</topic><topic>Similarity</topic><topic>Statistical analysis</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics and Computing/Statistics Programs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Statistics and computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cavalheiro, Lucca Portes</au><au>Bernard, Simon</au><au>Barddal, Jean Paul</au><au>Heutte, Laurent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random forest kernel for high-dimension low sample size classification</atitle><jtitle>Statistics and computing</jtitle><stitle>Stat Comput</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>34</volume><issue>1</issue><artnum>9</artnum><issn>0960-3174</issn><eissn>1573-1375</eissn><abstract>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11222-023-10309-0</doi><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0960-3174 |
ispartof | Statistics and computing, 2024-02, Vol.34 (1), Article 9 |
issn | 0960-3174 1573-1375 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_04253909v1 |
source | SpringerLink Journals |
subjects | Algorithms Artificial Intelligence Classification Computer Science Kernels Machine Learning Medical imaging Original Paper Probability and Statistics in Computer Science Similarity Statistical analysis Statistical Theory and Methods Statistics Statistics and Computing/Statistics Programs |
title | Random forest kernel for high-dimension low sample size classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T05%3A53%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20forest%20kernel%20for%20high-dimension%20low%20sample%20size%20classification&rft.jtitle=Statistics%20and%20computing&rft.au=Cavalheiro,%20Lucca%20Portes&rft.date=2024-02-01&rft.volume=34&rft.issue=1&rft.artnum=9&rft.issn=0960-3174&rft.eissn=1573-1375&rft_id=info:doi/10.1007/s11222-023-10309-0&rft_dat=%3Cproquest_hal_p%3E2879581430%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879581430&rft_id=info:pmid/&rfr_iscdi=true |