A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection

Feature selection is an important research area for big data analysis. In recent years, various feature selection approaches have been developed, which can be divided into four categories: filter, wrapper, embedded, and combined methods. In the combined category, many hybrid genetic approaches from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2018-01, Vol.6, p.22863-22874
Hauptverfasser: Liu, Xiao-Ying, Liang, Yong, Wang, Sai, Yang, Zi-Yi, Ye, Han-Shuo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Feature selection is an important research area for big data analysis. In recent years, various feature selection approaches have been developed, which can be divided into four categories: filter, wrapper, embedded, and combined methods. In the combined category, many hybrid genetic approaches from evolutionary computations combine filter and wrapper measures of feature evaluation to implement a population-based global optimization with efficient local search. However, there are limitations to existing combined methods, such as the two-stage and inconsistent feature evaluation measures, difficulties in analyzing data with high feature interaction, and challenges in handling large-scale features and instances. Focusing on these three limitations, we proposed a hybrid genetic algorithm with wrapper−embedded feature approach for selection approach (HGAWE), which combines genetic algorithm (global search) with embedded regularization approaches (local search) together. We also proposed a novel chromosome representation (intron+exon) for global and local optimization procedures in HGAWE. Based on this "intron+exon" encoding, the regularization method can select the relevant features and construct the learning model simultaneously, and genetic operations aim to globally optimize the control parameters in the above non-convex regularization. We mention that any efficient regularization approach can serve as the embedded method in HGAWE, and a hybrid L_{1/2}+L_{2} regularization approach is investigated as an example in this paper. Empirical study of the HGAWE approach on some simulation data and five gene microarray data sets indicates that it outperforms the existing combined methods in terms of feature selection and classification accuracy.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2018.2818682