Random forest kernel for high-dimension low sample size classification

High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we propo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Statistics and computing 2024-02, Vol.34 (1), Article 9
Hauptverfasser:	Cavalheiro, Lucca Portes, Bernard, Simon, Barddal, Jean Paul, Heutte, Laurent
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Classification Computer Science Kernels Machine Learning Medical imaging Original Paper Probability and Statistics in Computer Science Similarity Statistical analysis Statistical Theory and Methods Statistics Statistics and Computing/Statistics Programs
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page
container_title	Statistics and computing
container_volume	34
creator	Cavalheiro, Lucca Portes Bernard, Simon Barddal, Jean Paul Heutte, Laurent
description	High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.
doi_str_mv	10.1007/s11222-023-10309-0
format	Article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_04253909v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879581430</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879581430</pqid></control><display><type>article</type><title>Random forest kernel for high-dimension low sample size classification</title><source>SpringerLink Journals</source><creator>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creator><creatorcontrib>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</creatorcontrib><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><identifier>ISSN: 0960-3174</identifier><identifier>EISSN: 1573-1375</identifier><identifier>DOI: 10.1007/s11222-023-10309-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Kernels ; Machine Learning ; Medical imaging ; Original Paper ; Probability and Statistics in Computer Science ; Similarity ; Statistical analysis ; Statistical Theory and Methods ; Statistics ; Statistics and Computing/Statistics Programs</subject><ispartof>Statistics and computing, 2024-02, Vol.34 (1), Article 9</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>Attribution</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</citedby><cites>FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</cites><orcidid>0000-0003-0200-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11222-023-10309-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11222-023-10309-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://hal.science/hal-04253909$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><title>Random forest kernel for high-dimension low sample size classification</title><title>Statistics and computing</title><addtitle>Stat Comput</addtitle><description>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Kernels</subject><subject>Machine Learning</subject><subject>Medical imaging</subject><subject>Original Paper</subject><subject>Probability and Statistics in Computer Science</subject><subject>Similarity</subject><subject>Statistical analysis</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics and Computing/Statistics Programs</subject><issn>0960-3174</issn><issn>1573-1375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKd_wKeCTz5E703apn0cQ50wEESfQ5qmW2bbzGRT9NebWtE3n5Jz853DzSHkHOEKAcR1QGSMUWCcInAoKRyQCWYiSi6yQzKBMgfKUaTH5CSEDQBiztMJuX1Ufe26pHHehF3yYnxv2kEla7ta09p2pg_W9Unr3pOgum1rkmA_TaJbFYJtrFa7-HxKjhrVBnP2c07J8-3N03xBlw939_PZkmqe8R0tsoYrKApQdY6m4lBrA1rnVd4opdJMcY0ClE7jRVec6VJgoUzaGKxKYRifkssxd61aufW2U_5DOmXlYraUwwxSlvESyjeM7MXIbr173cffyY3b-z6uJ1khyqzAlEOk2Ehp70LwpvmNRZBDt3LsVsZu5Xe3cjDx0RQi3K-M_4v-x_UFZ2x8HQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Cavalheiro, Lucca Portes</creator><creator>Bernard, Simon</creator><creator>Barddal, Jean Paul</creator><creator>Heutte, Laurent</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag (Germany)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid></search><sort><creationdate>20240201</creationdate><title>Random forest kernel for high-dimension low sample size classification</title><author>Cavalheiro, Lucca Portes ; Bernard, Simon ; Barddal, Jean Paul ; Heutte, Laurent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-85f3a0880ad61eb30dce0cc6b6faaa45a3c170ac4a3ccb32c9718ae4fe1b97e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Kernels</topic><topic>Machine Learning</topic><topic>Medical imaging</topic><topic>Original Paper</topic><topic>Probability and Statistics in Computer Science</topic><topic>Similarity</topic><topic>Statistical analysis</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics and Computing/Statistics Programs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cavalheiro, Lucca Portes</creatorcontrib><creatorcontrib>Bernard, Simon</creatorcontrib><creatorcontrib>Barddal, Jean Paul</creatorcontrib><creatorcontrib>Heutte, Laurent</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Statistics and computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cavalheiro, Lucca Portes</au><au>Bernard, Simon</au><au>Barddal, Jean Paul</au><au>Heutte, Laurent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random forest kernel for high-dimension low sample size classification</atitle><jtitle>Statistics and computing</jtitle><stitle>Stat Comput</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>34</volume><issue>1</issue><artnum>9</artnum><issn>0960-3174</issn><eissn>1573-1375</eissn><abstract>High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the random forest dissimilarity, that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11222-023-10309-0</doi><orcidid>https://orcid.org/0000-0003-0200-4294</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0960-3174
ispartof	Statistics and computing, 2024-02, Vol.34 (1), Article 9
issn	0960-3174 1573-1375
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_04253909v1
source	SpringerLink Journals
subjects	Algorithms Artificial Intelligence Classification Computer Science Kernels Machine Learning Medical imaging Original Paper Probability and Statistics in Computer Science Similarity Statistical analysis Statistical Theory and Methods Statistics Statistics and Computing/Statistics Programs
title	Random forest kernel for high-dimension low sample size classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T05%3A53%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20forest%20kernel%20for%20high-dimension%20low%20sample%20size%20classification&rft.jtitle=Statistics%20and%20computing&rft.au=Cavalheiro,%20Lucca%20Portes&rft.date=2024-02-01&rft.volume=34&rft.issue=1&rft.artnum=9&rft.issn=0960-3174&rft.eissn=1573-1375&rft_id=info:doi/10.1007/s11222-023-10309-0&rft_dat=%3Cproquest_hal_p%3E2879581430%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879581430&rft_id=info:pmid/&rfr_iscdi=true