Optimal Feature Selection Based on Discrete Grasshopper Optimization Algorithm and K-nearest Neighbor Classifier

In the majority of data mining tasks, feature selection serves as an essential pre-processing step. The most important attributes are selected to lower the dimensionality reduction of data set and enhance the precision of classification. Natural heuristic algorithms are extensively employed in the r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Engineering letters 2024-01, Vol.32 (1), p.89
Hauptverfasser:	Qi, Yu-Liang, Wang, Jie-Sheng, Song, Yu-Wei, Wang, Yu-Cai, Song, Hao-Ming, Hou, Jia-Ning
Format:	Artikel
Sprache:	eng
Schlagworte:	Computing time Data mining Datasets Eigenvalues Feature selection Heuristic Heuristic methods K-nearest neighbors algorithm Optimization Optimization algorithms Performance indices Performance tests Search algorithms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the majority of data mining tasks, feature selection serves as an essential pre-processing step. The most important attributes are selected to lower the dimensionality reduction of data set and enhance the precision of classification. Natural heuristic algorithms are extensively employed in the realm of encapsulated feature selection. Based on the wrapper feature selection method, seven natural heuristic algorithms are used to solve feature selection problems and perform performance comparison, which include Slime Mold Algorithm (SMA), Whale Optimization Algorithm (WOA), Harris Hawks Optimization Algorithm (HHO), Marine Predator Algorithm (MPA), Butterfly Optimization Algorithm (BOA), Cuckoo Search (CS) and Firefly Algorithm (FA). At the same time, performance tests are carried out on 21 standard UCI data sets to verify the functionality of various algorithms, and the convergence curves and accuracy boxplots of 7 natural heuristic algorithms on 21 data sets are given. The simulation outcomes were assessed utilizing the mean and standard deviation of fitness, as well as the number of chosen features, and the running lime, with the oplimal value in bold. By comparing the comprehensive performance indexes, MPA obtained the maximum mean fitness value in most data sets (16 data sets), followed by FA (6 data sets). SMA obtained the best performance and finds the minimum eigenvalues (20 data sets) in multiple data sets and has an advantage in computing time.
ISSN:	1816-093X 1816-0948