Imbalanced Industrial Load Identification Based on Optimized CatBoost with Entropy Features

The industrial load sample data categories are unbalanced, resulting in low classification performance for a few sample categories. An imbalanced industrial load identification method based on optimized CatBoost with entropy features is proposed. Firstly, multiple original samples of industrial load...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of electrical engineering & technology 2024, 19(8), , pp.4817-4832
Hauptverfasser: Lin, Lin, Ma, Xueli, Chen, Cheng, Xu, Jinhao, Huang, Nantian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The industrial load sample data categories are unbalanced, resulting in low classification performance for a few sample categories. An imbalanced industrial load identification method based on optimized CatBoost with entropy features is proposed. Firstly, multiple original samples of industrial load data and their corresponding switch states are selected from the dataset. The original samples are segmented in the time domain, dividing each sample into three time-domain intervals. The 27 time-domain features containing 8 types of entropy features are extracted from different time-domain intervals, resulting in the construction of an 81-dimensional original feature set. Next, the feature importance is calculated and sorted based on the Prediction Value Change method. The optimal subset of classification features for the corresponding device in the original sample is determined through forward feature selection, with the CatBoost classification accuracy being used as the decision variable. Secondly, the Borderline-SMOTE method is used to synthesize the sample data for balancing processing to obtain balanced switching sample data. Finally, the CatBoost classifier with Bayesian optimization and hyperBand hyperparameter optimization is constructed to identify industrial loads. The experimental results show that this method has the advantages of high feature extraction efficiency and high accuracy in identifying imbalanced small sample data.
ISSN:1975-0102
2093-7423
DOI:10.1007/s42835-024-01933-5