A weighted hybrid ensemble method for classifying imbalanced data

In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2020-09, Vol.203, p.106087, Article 106087
Hauptverfasser:	Zhao, Jiakun, Jin, Ju, Chen, Si, Zhang, Ruifeng, Yu, Bilin, Liu, Qingfang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Base classifiers Binary classification Boosting algorithm Classification Classifiers Data imbalance Data mining Data sampling Data sampling methods Datasets Machine learning Methods Performance evaluation Sampling methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2020.106087