Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance

Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-09
Hauptverfasser:	Jacobus G M van der Linden, Vos, Daniël, de Weerdt, Mathijs M, Verwer, Sicco, Demirović, Emir
Format:	Artikel
Sprache:	eng
Schlagworte:	Decision trees Optimization Synthetic data Tuning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Jacobus G M van der Linden Vos, Daniël de Weerdt, Mathijs M Verwer, Sicco Demirović, Emir
description	Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques. Additionally, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the exact opposite. In this paper, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 13 objective functions, including four novel objectives resulting from our analysis, seven tuning methods, and six claims from the literature on optimal and greedy methods on 165 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we discover new non-concave objectives, highlight the importance of proper tuning, support and refute several claims from the literature, and provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods, and code for future comparisons.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3107312185</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3107312185</sourcerecordid><originalsourceid>FETCH-proquest_journals_31073121853</originalsourceid><addsrcrecordid>eNqNik8LgjAcQEcQJOV3-EFXBd0yvXXo780I6SpLf9ZEN9um0Ldvhz5Ap8fjvRnxKGNxmG0oXRDfmDaKIrpNaZIwj9zzwYqed6A0nDVi_YEDVsIIJaFwbnZww8m5FfIJ9oVCQ_5osbJiQhNAMUoXAuCyhivqRumeywpXZN7wzqD_45KsT8difwkHrd4jGlu2atTSpZLFUcpiGmcJ--_6AsZAQKA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3107312185</pqid></control><display><type>article</type><title>Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance</title><source>Free E- Journals</source><creator>Jacobus G M van der Linden ; Vos, Daniël ; de Weerdt, Mathijs M ; Verwer, Sicco ; Demirović, Emir</creator><creatorcontrib>Jacobus G M van der Linden ; Vos, Daniël ; de Weerdt, Mathijs M ; Verwer, Sicco ; Demirović, Emir</creatorcontrib><description>Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques. Additionally, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the exact opposite. In this paper, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 13 objective functions, including four novel objectives resulting from our analysis, seven tuning methods, and six claims from the literature on optimal and greedy methods on 165 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we discover new non-concave objectives, highlight the importance of proper tuning, support and refute several claims from the literature, and provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods, and code for future comparisons.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision trees ; Optimization ; Synthetic data ; Tuning</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Jacobus G M van der Linden</creatorcontrib><creatorcontrib>Vos, Daniël</creatorcontrib><creatorcontrib>de Weerdt, Mathijs M</creatorcontrib><creatorcontrib>Verwer, Sicco</creatorcontrib><creatorcontrib>Demirović, Emir</creatorcontrib><title>Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance</title><title>arXiv.org</title><description>Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques. Additionally, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the exact opposite. In this paper, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 13 objective functions, including four novel objectives resulting from our analysis, seven tuning methods, and six claims from the literature on optimal and greedy methods on 165 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we discover new non-concave objectives, highlight the importance of proper tuning, support and refute several claims from the literature, and provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods, and code for future comparisons.</description><subject>Decision trees</subject><subject>Optimization</subject><subject>Synthetic data</subject><subject>Tuning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNik8LgjAcQEcQJOV3-EFXBd0yvXXo780I6SpLf9ZEN9um0Ldvhz5Ap8fjvRnxKGNxmG0oXRDfmDaKIrpNaZIwj9zzwYqed6A0nDVi_YEDVsIIJaFwbnZww8m5FfIJ9oVCQ_5osbJiQhNAMUoXAuCyhivqRumeywpXZN7wzqD_45KsT8difwkHrd4jGlu2atTSpZLFUcpiGmcJ--_6AsZAQKA</recordid><startdate>20240919</startdate><enddate>20240919</enddate><creator>Jacobus G M van der Linden</creator><creator>Vos, Daniël</creator><creator>de Weerdt, Mathijs M</creator><creator>Verwer, Sicco</creator><creator>Demirović, Emir</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240919</creationdate><title>Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance</title><author>Jacobus G M van der Linden ; Vos, Daniël ; de Weerdt, Mathijs M ; Verwer, Sicco ; Demirović, Emir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31073121853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decision trees</topic><topic>Optimization</topic><topic>Synthetic data</topic><topic>Tuning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jacobus G M van der Linden</creatorcontrib><creatorcontrib>Vos, Daniël</creatorcontrib><creatorcontrib>de Weerdt, Mathijs M</creatorcontrib><creatorcontrib>Verwer, Sicco</creatorcontrib><creatorcontrib>Demirović, Emir</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jacobus G M van der Linden</au><au>Vos, Daniël</au><au>de Weerdt, Mathijs M</au><au>Verwer, Sicco</au><au>Demirović, Emir</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance</atitle><jtitle>arXiv.org</jtitle><date>2024-09-19</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques. Additionally, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the exact opposite. In this paper, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 13 objective functions, including four novel objectives resulting from our analysis, seven tuning methods, and six claims from the literature on optimal and greedy methods on 165 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we discover new non-concave objectives, highlight the importance of proper tuning, support and refute several claims from the literature, and provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods, and code for future comparisons.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3107312185
source	Free E- Journals
subjects	Decision trees Optimization Synthetic data Tuning
title	Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A35%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimal%20or%20Greedy%20Decision%20Trees?%20Revisiting%20their%20Objectives,%20Tuning,%20and%20Performance&rft.jtitle=arXiv.org&rft.au=Jacobus%20G%20M%20van%20der%20Linden&rft.date=2024-09-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3107312185%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3107312185&rft_id=info:pmid/&rfr_iscdi=true