Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning

•A deep reinforcement learning (RL) control strategy for residential HVAC is proposed.•The control strategy is based on the deep deterministic policy gradient (DDPG) method.•Simulation results prove the economy and time efficiency of the DDPG method.•DDPG is compared with deep Q network (DQN) and ba...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied energy 2021-01, Vol.281, p.116117, Article 116117
Hauptverfasser:	Du, Yan, Zandi, Helia, Kotevska, Olivera, Kurte, Kuldeep, Munk, Jeffery, Amasyali, Kadir, Mckee, Evan, Li, Fangxing
Format:	Artikel
Sprache:	eng
Schlagworte:	Actor-critic learning Deep deterministic policy gradient (DDPG) Deep reinforcement learning (deep RL) Demand response Multi-zone residential HVAC
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A deep reinforcement learning (RL) control strategy for residential HVAC is proposed.•The control strategy is based on the deep deterministic policy gradient (DDPG) method.•Simulation results prove the economy and time efficiency of the DDPG method.•DDPG is compared with deep Q network (DQN) and baseline cases for verification.•The generalization of the DDPG method is further verified in different scenarios. Residential heating, ventilation, and air conditioning (HVAC) has been considered as an important demand response resource. However, the optimization of residential HVAC control is no trivial task due to the complexity of the thermal dynamic models of buildings and uncertainty associated with both occupant-driven heat loads and weather forecasts. In this paper, we apply a novel model-free deep reinforcement learning (RL) method, known as the deep deterministic policy gradient (DDPG), to generate an optimal control strategy for a multi-zone residential HVAC system with the goal of minimizing energy consumption cost while maintaining the users’ comfort. The applied deep RL-based method learns through continuous interaction with a simulated building environment and without referring to any prior model knowledge. Simulation results show that compared with the state-of-art deep Q network (DQN), the DDPG-based HVAC control strategy can reduce the energy consumption cost by 15% and reduce the comfort violation by 79%; and when compared with a rule-based HVAC control strategy, the comfort violation can be reduced by 98%. In addition, experiments with different building models and retail price models demonstrate that the well-trained DDPG-based HVAC control strategy has high generalization and adaptability to unseen environments, which indicates its practicability for real-world implementation.
ISSN:	0306-2619 1872-9118
DOI:	10.1016/j.apenergy.2020.116117