Data-driven Offline Reinforcement Learning for HVAC-systems

This paper presents a novel framework for Offline Reinforcement Learning (RL) with online fine tuning for Heating Ventilation and Air-conditioning (HVAC) systems. The framework presents a method to do pre-training in a black box model environment, where the black box models are built on data acquire...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Energy (Oxford) 2022-12, Vol.261, p.125290, Article 125290
Hauptverfasser: Blad, Christian, Bøgh, Simon, Kallesøe, Carsten Skovmose
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a novel framework for Offline Reinforcement Learning (RL) with online fine tuning for Heating Ventilation and Air-conditioning (HVAC) systems. The framework presents a method to do pre-training in a black box model environment, where the black box models are built on data acquired under a traditional control policy. The paper focuses on the application of Underfloor Heating (UFH) with an air-to-water-based heat pump. However, the framework should also generalize to other HVAC control applications. Because Black box methods are used is there little to no commissioning time when applying this framework to other buildings/simulations beyond the one presented in this study. This paper explores and deploys Artificial Neural Network (ANN) based methods to design efficient controllers. Two ANN methods are tested and presented in this paper; a Multilayer Perceptron (MLP) method and a Long Short Term Memory (LSTM) based method. It is found that the LSTM-based method reduces the prediction error by 45% when compared with a MLP model. Additionally, different network architectures are tested. It is found that by creating a new model for each time step, performance can be improved additionally 19%. By using these models in the framework presented in this paper, it is shown that a Multi-Agent RL algorithm can be deployed without ever performing worse than an industrial controller. Furthermore, it is shown that if building data from a Building Management System (BMS) is available, an RL agent can be deployed which performs close to optimally from the first day of deployment. An optimal control policy reduces the cost of heating by 19.4 % when compared to a traditional control policy in the simulation presented in this paper. •Offline MARL can eliminate poor behavior during training while converging.•LSTM layers are an effective method for obtaining training models for RL.•Simulation model of an underfloor heating system supplied by a heat pump.•Simulations show cost is reduced by 19.4% when compared to traditional controllers.
ISSN:0360-5442
DOI:10.1016/j.energy.2022.125290