Imbalanced Industrial Load Identification Based on Optimized CatBoost with Entropy Features
The industrial load sample data categories are unbalanced, resulting in low classification performance for a few sample categories. An imbalanced industrial load identification method based on optimized CatBoost with entropy features is proposed. Firstly, multiple original samples of industrial load...
Gespeichert in:
Veröffentlicht in: | Journal of electrical engineering & technology 2024, 19(8), , pp.4817-4832 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The industrial load sample data categories are unbalanced, resulting in low classification performance for a few sample categories. An imbalanced industrial load identification method based on optimized CatBoost with entropy features is proposed. Firstly, multiple original samples of industrial load data and their corresponding switch states are selected from the dataset. The original samples are segmented in the time domain, dividing each sample into three time-domain intervals. The 27 time-domain features containing 8 types of entropy features are extracted from different time-domain intervals, resulting in the construction of an 81-dimensional original feature set. Next, the feature importance is calculated and sorted based on the Prediction Value Change method. The optimal subset of classification features for the corresponding device in the original sample is determined through forward feature selection, with the CatBoost classification accuracy being used as the decision variable. Secondly, the Borderline-SMOTE method is used to synthesize the sample data for balancing processing to obtain balanced switching sample data. Finally, the CatBoost classifier with Bayesian optimization and hyperBand hyperparameter optimization is constructed to identify industrial loads. The experimental results show that this method has the advantages of high feature extraction efficiency and high accuracy in identifying imbalanced small sample data. |
---|---|
ISSN: | 1975-0102 2093-7423 |
DOI: | 10.1007/s42835-024-01933-5 |