Unsupervised feature selection for interpretable classification in behavioral assessment of children

In this paper, we consider a data set taken from the administration of the Behavior Assessment System for Children test to 157 subjects, and we approach the problem of clustering and classify the subjects in an interpretable fashion. Because the Behavior Assessment System for Children test is origin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems 2017-08, Vol.34 (4), p.n/a
Hauptverfasser: Jiménez, Fernando, Jódar, Rosalia, Martín, Maria del Pilar, Sánchez, Gracia, Sciavicco, Guido
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we consider a data set taken from the administration of the Behavior Assessment System for Children test to 157 subjects, and we approach the problem of clustering and classify the subjects in an interpretable fashion. Because the Behavior Assessment System for Children test is originally composed of 149 questions (152 in the particular version used for this experiment), we first propose a feature selection wrapper model composed by a multi‐objective evolutionary algorithm, the iterative clustering method expectation–maximization, and the classifier C4.5 for the unsupervised feature selection towards the classification of the data with two objectives: maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We propose a methodology to integrate feature selection for unsupervised classification, model evaluation, decision‐making (to choose the most satisfactory model according to an a posteriori process in a multi‐objective context), and testing. The selected data set that is the result of this process, where each instance is labeled with its class, is then used for supervised learning via both C4.5 and a novel evolutionary computation‐based fuzzy classifier to obtain interpretable rules. We discuss and compare the behavior of two different evolutionary algorithms (ENORA (Evolutionary NOn‐dominated Radial slots based Algorithm) and NSGA‐II (Non‐dominated Sorted Genetic Algorithm)) at different levels: as search strategies for feature selection, as search strategies for fuzzy classification, and in terms of quality of the results. It turns out that ENORA behaves better in terms of quality of the result in the feature selection phase (obtaining a selection that shows higher accuracy under C4.5 after cross‐validation), and again in the fuzzy classification phase, from both points of view: hypervolume evolution and interpretability of results. During the entire process, the solutions are validated by the psychologists who collected the data.
ISSN:0266-4720
1468-0394
DOI:10.1111/exsy.12173