Mitigating an adoption barrier of reinforcement learning-based control strategies in buildings
[Display omitted] Reinforcement learning (RL) algorithms have shown great promise in controlling building systems to minimize energy use, operational cost, and occupant discomfort. RL agents learn a control policy by interacting with the physical or simulated environment that represents building sys...
Gespeichert in:
Veröffentlicht in: | Energy and buildings 2023-04, Vol.285, p.112878, Article 112878 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
Reinforcement learning (RL) algorithms have shown great promise in controlling building systems to minimize energy use, operational cost, and occupant discomfort. RL agents learn a control policy by interacting with the physical or simulated environment that represents building systems, occupants, and the outside world. Yet, a large amount of data is needed to learn a near-optimal control policy in the physical building, which requires months or years to collect. Moreover, an agent’s performance while training can be quite poor, causing occupant discomfort and additional costs. Learning in simulation does not have such real-world impacts, but differences between buildings, and indeed between simulation and physical buildings, potentially lead to poor performance when a policy learned in simulation is deployed in a physical building. This paper addresses part of the sim-to-real problem by training on one set of simulated (source) buildings and then deploying to a novel simulated (target) building. This approach significantly reduces the training cost of RL on the target building by 1) learning a large number of policies on prototype buildings, 2) evaluating these policies on historical data obtained from the target building’s environment and selecting the best ones according to the evaluation result, and 3) using the best policies to control the target building while continuing to learn. The proposed approach involves learning a diverse population of control policies using a novel diversity-induced RL algorithm, and policy clustering, evaluation, and selection techniques. Three case studies show our approach assigns policies to the target building that outperform the default controller by 4.0–30.4%, without sacrificing thermal comfort. Similarly, they outperform policies that are learned only on the target building (i.e., without transfer) by 24.9–74.9% and 16.2–72.2% before and after 500 months of training, respectively. |
---|---|
ISSN: | 0378-7788 |
DOI: | 10.1016/j.enbuild.2023.112878 |