Example-dependent cost-sensitive decision trees

•Example-dependent cost-sensitive tree algorithm.•Each example is assumed to have different financial cost.•Application on credit card fraud detection, credit scoring and direct marketing.•Focus on maximizing the financial savings instead of accuracy.•Code is open source and available at albahnsen.c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2015-11, Vol.42 (19), p.6609-6619
Hauptverfasser: Correa Bahnsen, Alejandro, Aouada, Djamila, Ottersten, Björn
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Example-dependent cost-sensitive tree algorithm.•Each example is assumed to have different financial cost.•Application on credit card fraud detection, credit scoring and direct marketing.•Focus on maximizing the financial savings instead of accuracy.•Code is open source and available at albahnsen.com/CostSensitiveClassification. Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. State-of-the-art example-dependent cost-sensitive techniques only introduce the cost to the algorithm, either before or after training, therefore, leaving opportunities to investigate the potential impact of algorithms that take into account the real financial example-dependent costs during an algorithm training. In this paper, we propose an example-dependent cost-sensitive decision tree algorithm, by incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Then, using three different databases, from three real-world applications: credit card fraud detection, credit scoring and direct marketing, we evaluate the proposed method. The results show that the proposed algorithm is the best performing method for all databases. Furthermore, when compared against a standard decision tree, our method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings, leading to a method that not only has more business-oriented results, but also a method that creates simpler models that are easier to analyze.
ISSN:0957-4174
1873-6793
1873-6793
DOI:10.1016/j.eswa.2015.04.042