Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

Summary One of the difficult challenges facing data miners is that algorithm performance degrades if the feature space contains redundant or irrelevant features. Therefore, as a critical preprocess task, dimension reduction is used to build a smaller space containing valuable features. There are 2 d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of communication systems 2018-05, Vol.31 (8), p.n/a
Hauptverfasser: Seyyedi, Seyyed Hossein, Minaei‐Bidgoli, Behrouz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary One of the difficult challenges facing data miners is that algorithm performance degrades if the feature space contains redundant or irrelevant features. Therefore, as a critical preprocess task, dimension reduction is used to build a smaller space containing valuable features. There are 2 different approaches for dimension reduction: feature extraction and feature selection, which itself is divided into wrapper and filter approaches. In high‐dimensional spaces, feature extraction and wrapper approaches are not applicable due to the time complexity. On the other hand, the filter approach suffers from inaccuracy. One main reason for this inaccuracy is that the subset's size is not determined considering specifications of the problem. In this paper, we propose ESS (estimator learning automaton‐based subset selection) as a new method for feature selection in high‐dimensional spaces. The innovation of ESS is that it combines wrapper and filter ideas and uses estimator learning automata to efficiently determine a feature subset that leads to a desirable tradeoff between the accuracy and efficiency of the learning algorithm. To find a qualified subset for a special processing algorithm that functions on an arbitrary dataset, ESS uses an automaton to score each candidate subset upon the scale of the subset and accuracy of the learning algorithm using it. In the end, the subset with the highest score is returned. We have used ESS for feature selection in the framework of spam detection, a text classification task for email as a pervasive communication medium. The results show achievement in reaching the goal stated above. In this paper, we propose estimator learning automaton‐based subset selection (ESS) as a new method for feature selection in high‐dimensional spaces. The innovation of ESS is that it combines wrapper and filter ideas and uses estimator learning automata to efficiently determine a feature subset that leads to a desirable tradeoff between the accuracy and efficiency of the learning algorithm. To find a qualified subset, ESS uses an automaton to score each candidate subset upon the scale of it and accuracy of the learning algorithm using it.
ISSN:1074-5351
1099-1131
DOI:10.1002/dac.3541