Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift
A data stream can be defined as a system that continually generates a lot of data over time. Today, processing data streams requires new demands and challenging tasks in the data mining and machine learning areas. Concept Drift is a problem commonly characterized as changes in the distribution of th...
Gespeichert in:
Veröffentlicht in: | ACM transactions on knowledge discovery from data 2022-04, Vol.16 (2), p.1-20 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A data stream can be defined as a system that continually generates a lot of data over time. Today, processing data streams requires new demands and challenging tasks in the data mining and machine learning areas.
Concept Drift
is a problem commonly characterized as changes in the distribution of the data within a data stream. The implementation of new methods for dealing with data streams where concept drifts occur requires algorithms that can adapt to several scenarios to improve its performance in the different experimental situations where they are tested. This research proposes a strategy for dynamic parameter adjustment in the presence of concept drifts. Parameter Estimation Procedure (PEP) is a general method proposed for dynamically adjusting parameters which is applied to the diversity parameter (λ) of several classification ensembles commonly used in the area. To this end, the proposed estimation method (PEP) was used to create Boosting-like Online Learning Ensemble with Parameter Estimation (BOLE-PE), Online AdaBoost-based M1 with Parameter Estimation (OABM1-PE), and Oza and Russell’s Online Bagging with Parameter Estimation (OzaBag-PE), based on the existing ensembles BOLE, OABM1, and OzaBag, respectively. To validate them, experiments were performed with artificial and real-world datasets using Hoeffding Tree (HT) as base classifier. The accuracy results were statistically evaluated using a variation of the Friedman test and the Nemenyi post-hoc test. The experimental results showed that the application of the dynamic estimation in the diversity parameter (λ) produced good results in most scenarios, i.e., the modified methods have improved accuracy in the experiments with both artificial and real-world datasets. |
---|---|
ISSN: | 1556-4681 1556-472X |
DOI: | 10.1145/3466616 |