Proposing suitable data imputation methods by adopting a Stage wise approach for various classes of smart meters missing data – Practical approach
[Display omitted] •Introducing a stage-wise data-driven approach for detecting and classifying bad data.•Introducing a novel PSO based multiple imputation for imputing missing values.•Classifying missing values into four different classes.•Imputing missing values in an artificial manner to have a co...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2022-01, Vol.187, p.115911, Article 115911 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
•Introducing a stage-wise data-driven approach for detecting and classifying bad data.•Introducing a novel PSO based multiple imputation for imputing missing values.•Classifying missing values into four different classes.•Imputing missing values in an artificial manner to have a complete time-series data.•Suggesting imputation methods using “Divide” and “Divide-Combine” strategies.
In recent years, the prediction model developed from smart meter data has been essential to analyze the load behavior of the customer, and to ensure reliability, it is necessary to have complete time-series data. However, due to practical limitations, there occur missing values in the smart meter data affecting the accuracy of the prediction model. Customers dealt in literature so far mostly belonged to residential, commercial, and industrial categories. In this paper, a stage-wise missing value treatment approach involving particle swarm optimization (PSO) comprising six stages has been proposed for the identification, classification, and imputation of missing values through multiple algorithms of data imputation in the smart meter dataset of an Indian institution. A real-time institutional smart meter dataset for 2019 has been considered in which the missing values are classified into four different classes based on their position and occurrence. Also, a complete time-series dataset from Kaggle has been considered to test the validity of the proposed approach. Simulation results using the R statistical package and MATLAB indicate that one of the imputation algorithms performs the best for each class of missing values, and its performance has been validated using the Root Mean Square Error (RMSE). |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2021.115911 |