Monte Carlo Simulation-Based Regression Tree Algorithm for Predicting Energy Consumption from Scarce Dataset

Most data-driven techniques rely on the availability of data. Hence, when the data provided are not sufficient, the algorithm might not work as intended. Thus, it is important to be able to predict the dynamics of the data, even when the number of available data is low, or scarce. This study aimed t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Data Science and Intelligent Systems 2024-04
Hauptverfasser: Darmanto, Tony, Tjen, Jimmy, Hoendarto, Genrawan
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most data-driven techniques rely on the availability of data. Hence, when the data provided are not sufficient, the algorithm might not work as intended. Thus, it is important to be able to predict the dynamics of the data, even when the number of available data is low, or scarce. This study aimed to predict the power consumption of a building given a scarce dataset via a novel Monte Carlo simulation-based Regression Tree (MCRT) algorithm. The main idea is to train Monte Carlo simulation on each leaf generated by the regression tree algorithm. Thus, the prediction no longer depends on the average of the samples contained in the leaf, but now depends on the probability of the samples. The proposed algorithm was validated on 2 datasets obtained from Universitas Widya Dharma Pontianak (UWDP), Indonesia, and Trapeznikov Institute of Control Sciences (TICS), Russia. To show that the MCRT algorithm is better than the regression tree (RT) algorithm, a two-tail hypothesis was proposed. Based on the experiments which were run on Python software with 16 GB RAM, 7th Gen Core i7 machine on 50 datasets randomly generated from the UWDP electrical data, it can be concluded that the MCRT algorithm performs better than the previous RT algorithm used to model scarce datasets with P-value = 0.000319. Furthermore, the proposed algorithm improves the model predictive accuracy of the RT algorithm by up to 2%.   Received: 30 December 2023 | Revised: 21 March 2024 | Accepted: 8 April 2024   Conflicts of Interest The authors declare that they have no conflicts of interest to this work.   Data Availability Statement The data that support the findings of this study are openly available in [Google Drive] at https://docs.google.com/spreadsheets/d/1o8sawOaOcX1kEm-dIdkcCUZhKoBduTAz/edit?usp=drive_link&ouid=115962907255429746256&rtpof=true&sd=true   Author Contribution Statement Tony Darmanto: Conceptualization, Validation, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration. Jimmy Tjen: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft, Writing - review & editing, Visualization. Genrawan Hoendarto: Validation, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing.
ISSN:2972-3841
2972-3841
DOI:10.47852/bonviewJDSIS42022395