An approach for feature selection using local searching and global optimization techniques
Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and image retrieval involve tens or hundreds of thousands of features in the dataset. However, many of these features may be irrelevant and redund...
Gespeichert in:
Veröffentlicht in: | Neural computing & applications 2017-10, Vol.28 (10), p.2915-2930 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and image retrieval involve tens or hundreds of thousands of features in the dataset. However, many of these features may be irrelevant and redundant, which only worsen the performance of the learning algorithms, and this may lead to the problem of overfitting. These superfluous features only degrade the accuracy and the computation time of a classification algorithm. So, the selection of relevant and nonredundant features is an important preprocessing step of any classification problem. Most of the global optimization techniques have the ability to converge to a solution quickly, but these begin with initializing a population randomly and the choice of initial population is an important step. In this paper, local searching algorithms have been used for generating a subset of relevant and nonredundant features; thereafter, a global optimization algorithm has been used so as to remove the limitations of global optimization algorithms, like lack of consistency in classification results and very high time complexity, to some extent. The computation time and classification accuracy are improved by using a feature set obtained from sequential backward selection and mutual information maximization algorithm which is fed to a global optimization technique (genetic algorithm, differential evolution or particle swarm optimization). In this proposed work, the computation time of these global optimization techniques has been reduced by using variance as stopping criteria. The proposed approach has been tested on publicly available Sonar, Wdbc and German datasets. |
---|---|
ISSN: | 0941-0643 1433-3058 |
DOI: | 10.1007/s00521-017-2959-y |