Learning automata based particle swarm optimization for solving class imbalance problem

Class imbalance is an important problem in many domains such as disease classification, network intrusion detection, fraud detection, and spam filtering. While dealing with imbalanced datasets, traditional supervised machine learning algorithms do not often provide acceptable results. Several approa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2021-12, Vol.113, p.107959, Article 107959
Hauptverfasser: Chakraborty, Anuran, Ghosh, Kushal Kanti, De, Rajonya, Cuevas, Erik, Sarkar, Ram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Class imbalance is an important problem in many domains such as disease classification, network intrusion detection, fraud detection, and spam filtering. While dealing with imbalanced datasets, traditional supervised machine learning algorithms do not often provide acceptable results. Several approaches are used to handle the class imbalance problem. Of these, undersampling approaches are mostly followed by the researchers in which the number of instances in the majority class gets reduced. Selection of instances from the majority class can be considered as an optimization problem. To this end, in this paper, we present an undersampling approach based on widely-used Particle Swarm Optimization (PSO). The majority class samples are first clustered to form the initial undersampled set. The samples to be selected are then optimized using PSO to give the best model. The parameters of PSO are fine tuned using Learning Automata. Appropriate metrics suitable for class imbalance problems have been used to construct the fitness function for optimizing the undersampled training set. The proposed method has achieved 2% to 10% performance improvement over most of the contemporary methods on various datasets with imbalance ratios ranging from 5 to 130, thus showing that the method is robust and useful in practical scenarios. The code of the proposed method can be accessed via https://github.com/kkg1999/Undersampling. [Display omitted] •A modified PSO, iPSO, has been used for undersampling imbalanced datasets.•The initial population of PSO is generated via clustering.•Learning automata is used for tuning the parameters.•The method is robust as it performs well on datasets of varying imbalance ratio.•The proposed method outperforms many state-of-the-art methods on the datasets.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2021.107959