Building HVAC control with reinforcement learning for reduction of energy cost and demand charge
Training and testing diagram: blue lines indicate the training phase to get optimal weights for Q-network; red lines indicate the testing phase using trained Q-network; green lines indicate the common steps shared by both training and testing. [Display omitted] Energy efficiency remains a significan...
Gespeichert in:
Veröffentlicht in: | Energy and buildings 2021-05, Vol.239, p.110833, Article 110833 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Training and testing diagram: blue lines indicate the training phase to get optimal weights for Q-network; red lines indicate the testing phase using trained Q-network; green lines indicate the common steps shared by both training and testing. [Display omitted]
Energy efficiency remains a significant topic in the control of building heating, ventilation, and air-conditioning (HVAC) systems, and diverse set of control strategies have been developed to optimize performance, including recently emerging techniques of deep reinforcement learning (DRL). While most existing works have focused on minimizing energy consumption, the generalization to energy cost minimization under time-varying electricity price profiles and demand charges has rarely been studied. Under these utility structures, significant cost savings can be achieved by pre-cooling buildings in the early morning when electricity is cheaper, thereby reducing expensive afternoon consumption and lowering peak demand. However, correctly identifying these savings requires planning horizons of one day or more. To tackle this problem, we develop Deep Q-Network (DQN) with an action processor, defining the environment as a Partially Observable Markov Decision Process (POMDP) with a reward function consisting of energy cost (time-of-use and peak demand charges) and a discomfort penalty, which is an extension of most reward functions used in existing DRL works in this area. Moreover, we develop a reward shaping technique to overcome the issue of reward sparsity caused by the demand charge. Through a single-zone building simulation platform, we demonstrate that the customized DQN outperforms the baseline rule-based policy, saving close to 6% of total cost with demand charges, while close to 8% without demand charges. |
---|---|
ISSN: | 0378-7788 1872-6178 |
DOI: | 10.1016/j.enbuild.2021.110833 |