Machine learning classification algorithms and anomaly detection in conventional meters and Tunisian electricity consumption large datasets

•Electricity fraud detection in conventional meters.•Combining ML: supervised and unsupervised algorithms.•Extensive feature engineering.•Multivariate Gaussian distribution for anomaly detection.•Light Gradient Boosting algorithm: performance and tuning. Although fraud in electricity consumption is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & electrical engineering 2021-09, Vol.94, p.107329, Article 107329
Hauptverfasser: Oprea, Simona-Vasilica, Bâra, Adela
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Electricity fraud detection in conventional meters.•Combining ML: supervised and unsupervised algorithms.•Extensive feature engineering.•Multivariate Gaussian distribution for anomaly detection.•Light Gradient Boosting algorithm: performance and tuning. Although fraud in electricity consumption is easier to detect when consumption is recorded hourly by smart meters, in most developing countries, where the propensity for fraud is higher, conventional meters are not yet affordable. Fraud detection is easier with time series data-logging due to the periodicity and variability of consumption that reveals deviations from a regular consumption pattern. In contrast, fraud detection with conventional meters remains a significant challenge because anomalies in consumption are well hidden within the normal consumption of other consumers. In this paper, large datasets regarding consumers and invoice data from Tunisia are combined and investigated with several Machine Learning (ML) classification algorithms, to detect irregularities in electricity consumption. By performing extensive feature engineering, including multivariate Gaussian distribution, the efficiency of ensemble classifiers such as Light Gradient Boosting (LGB) outperforms other algorithms and achieves realistic performance from challenging, unbalanced and uncorrelated input datasets.
ISSN:0045-7906
1879-0755
DOI:10.1016/j.compeleceng.2021.107329