An approach for feature selection using local searching and global optimization techniques

Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and image retrieval involve tens or hundreds of thousands of features in the dataset. However, many of these features may be irrelevant and redund...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications 2017-10, Vol.28 (10), p.2915-2930
Hauptverfasser:	Tiwari, Sadhana, Singh, Birmohan, Kaur, Manpreet
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Classification Combinatorial analysis Combinatorial chemistry Computation Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Evolutionary algorithms Gene expression Genetic algorithms Global optimization Image management Image Processing and Computer Vision Image retrieval Machine learning New Trends in data pre-processing methods for signal and image classification Nonlinear programming Optimization techniques Particle swarm optimization Preprocessing Probability and Statistics in Computer Science Searching
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and image retrieval involve tens or hundreds of thousands of features in the dataset. However, many of these features may be irrelevant and redundant, which only worsen the performance of the learning algorithms, and this may lead to the problem of overfitting. These superfluous features only degrade the accuracy and the computation time of a classification algorithm. So, the selection of relevant and nonredundant features is an important preprocessing step of any classification problem. Most of the global optimization techniques have the ability to converge to a solution quickly, but these begin with initializing a population randomly and the choice of initial population is an important step. In this paper, local searching algorithms have been used for generating a subset of relevant and nonredundant features; thereafter, a global optimization algorithm has been used so as to remove the limitations of global optimization algorithms, like lack of consistency in classification results and very high time complexity, to some extent. The computation time and classification accuracy are improved by using a feature set obtained from sequential backward selection and mutual information maximization algorithm which is fed to a global optimization technique (genetic algorithm, differential evolution or particle swarm optimization). In this proposed work, the computation time of these global optimization techniques has been reduced by using variance as stopping criteria. The proposed approach has been tested on publicly available Sonar, Wdbc and German datasets.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-017-2959-y