Hybrid fast unsupervised feature selection for high-dimensional data

•Propose a new hybrid feature selection algorithm based on BACO and clustering.•Modify linear binary ant system to reduce the search space complexity.•Inject mutation to increase randomness of search space.•Feature clustering to decrease the challenges of processing high-dimensional dataset.•Experim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2019-06, Vol.124, p.97-118
Hauptverfasser: Manbari, Zhaleh, AkhlaghianTab, Fardin, Salavati, Chiman
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Propose a new hybrid feature selection algorithm based on BACO and clustering.•Modify linear binary ant system to reduce the search space complexity.•Inject mutation to increase randomness of search space.•Feature clustering to decrease the challenges of processing high-dimensional dataset.•Experiment the method in several real-world social datasets and obtain more efficiency. The emergence of ``curse of dimensionality” issue as a result of high reduces datasets deteriorates the capability of learning algorithms, and also requires high memory and computational costs. Selection of features by discarding redundant and irrelevant features functions as a crucial machine learning technique aimed at reducing the dimensionality of these datasets, which improves the performance of the learning algorithm. Feature selection has been extensively applied in many application areas relevant to expert and intelligent systems, such as data mining and machine learning. Although many algorithms have been developed so far, they are still unsatisfying confronting high-dimensional data. This paper presented a new hybrid filter-based feature selection algorithm based on acombination of clustering and the modified Binary Ant System (BAS), called FSCBAS, to overcome the search space and high-dimensional data processing challenges efficiently. This model provided both global and local search capabilities between and within clusters. In the proposed method, inspired by genetic algorithm and simulated annealing, a damped mutation strategy was introduced that avoided falling into local optima, and a new redundancy reduction policy adopted to estimate the correlation between the selected features further improved the algorithm. The proposed method can be applied in many expert system applications such as microarray data processing, text classification and image processing in high-dimensional data to handle the high dimensionality of the feature space and improve classification performance simultaneously. The performance of the proposed algorithm was compared to that of state-of-the-art feature selection algorithms using different classifiers on real-world datasets. The experimental results confirmed that the proposed method reduced computational complexity significantly, and achieved better performance than the other feature selection methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.01.016