Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework

•A new Metalearning architecture proposal to recommend Feature Selection algorithms.•FS algorithms are ranked using a new multicriteria performance measure.•The proposed architecture has low computational cost and is human understandable.•Evaluation on 150 data sets from literature review and four w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2017-06, Vol.75, p.1-24
Hauptverfasser: Parmezan, Antonio Rafael Sabino, Lee, Huei Diana, Wu, Feng Chung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A new Metalearning architecture proposal to recommend Feature Selection algorithms.•FS algorithms are ranked using a new multicriteria performance measure.•The proposed architecture has low computational cost and is human understandable.•Evaluation on 150 data sets from literature review and four well-known FS algorithms.•Accuracy higher than 90% was obtained in the recommendation of FS algorithms. In Data Mining, during the preprocessing step, there is a considerable diversity of candidate algorithms to select important features, according to some criteria. This broad availability of algorithms that perform the Feature Selection task gives rise to the difficulty of choosing, a priori, between the algorithms at hand, the most promising one for a particular problem. In this paper, we present the proposal and evaluation of a new architecture for the recommendation of Feature Selection algorithms based on the use of Metalearning. Our framework is very flexible since the user can adapt it to its proper needs. This flexibility is one of the main advantages of our proposal over other approaches in the literature, which involve steps that cannot be adapted to the user’s local requirements. Furthermore, it combines several concepts of intelligent systems, including Machine Learning and Data Mining, with topics derived from expert systems, as user and data-driven knowledge, with meta-knowledge. This set of solutions coupled with leading-edge technologies allows our architecture to be integrated into any information system, which impact on the automation of services and in reducing human effort during the process. Regarding the Metalearning process, our framework considers several types of properties inherent to the data sets, as well as, Feature Selection algorithms based on many information, distance, dependence and consistency measures. The quality of the methods for Feature Selection was estimated according to a multicriteria performance measure, which guided the ranking process of these algorithms for the construction of data metabases. Proposed by the authors of this work, this multicriteria performance measure combines any three measurements on a single one, creating an interesting and powerful tool to evaluate not only FS algorithms but also to assess any context where it is necessary a combination to maximize a measure or minimize it. The recommendation models, represented by decision trees and induced from the training metabases, allowed us to see in what
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.01.013