Splitting criteria for classification problems with multi-valued attributes and large number of classes

•A framework for designing splitting criteria for handling multi-valued attributes.•An efficient variation of Gini Gain with an approximation guarantee of optimal value.•Experimental evidence that our new criteria are competitive with Twoing criterion.•Experimental evidence of the usefulness of aggr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2018-08, Vol.111, p.58-63
Hauptverfasser:	Laber, Eduardo Sany, de A. Mello Pereira, Felipe
Format:	Artikel
Sprache:	eng
Schlagworte:	Approximation Approximation algorithms Attribute selection Classification Computational efficiency Computer applications Computing time Criteria Datasets Decision trees Max-cut problem Splitting Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A framework for designing splitting criteria for handling multi-valued attributes.•An efficient variation of Gini Gain with an approximation guarantee of optimal value.•Experimental evidence that our new criteria are competitive with Twoing criterion.•Experimental evidence of the usefulness of aggregating nominal attributes. Decision Trees and Random Forests are among the most popular methods for classification tasks. Two key issues faced by these methods are: how to select the best attribute to associate with a node and how to split the samples given the selected attribute. This paper addresses an important challenge that arises when nominal attributes with a large number of values are present: the computational time required to compute splits of good quality. We present a framework to generate computationally efficient splitting criteria that handle, with theoretical approximation guarantee, multi-valued nominal attributes for classification tasks with a large number of classes. Experiments with a number of datasets suggest that a method derived from our framework is competitive in terms of accuracy and speed with the Twoing criterion, one of few criteria available that is able to handle, with optimality guarantee, nominal attributes with a large number of distinct values. However, this method has the advantage of also efficiently handling datasets with a large number of classes. These experiments also give evidence of the potential of aggregating attributes to improve the classification power of decision trees.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2018.04.013