Towards safe and sustainable reinforcement learning for real-time strategy games

Combining Deep Neural Networks with Reinforcement Learning, known as Deep Reinforcement Learning (DRL), is revolutionizing fields like medicine, industry, and gaming. DRL has achieved groundbreaking results, particularly in complex Real-Time Strategy (RTS) games such as StarCraft II and Dota 2, serv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2024-09, Vol.679, p.120980, Article 120980
Hauptverfasser: Andersen, Per-Arne, Goodwin, Morten, Granmo, Ole-Christoffer
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Combining Deep Neural Networks with Reinforcement Learning, known as Deep Reinforcement Learning (DRL), is revolutionizing fields like medicine, industry, and gaming. DRL has achieved groundbreaking results, particularly in complex Real-Time Strategy (RTS) games such as StarCraft II and Dota 2, serving as benchmarks for testing RL algorithms' robustness and safety. Despite these successes, DRL algorithms face challenges, including high computational costs and a lack of safety-aware approaches. Training these algorithms requires extensive computational resources, leading to a significant divide between algorithms developed on supercomputers and those feasible on standard hardware. This also raises sustainability concerns due to increased CO2 emissions. Additionally, most RL algorithms are risk-neutral, limiting their deployment in safety-critical systems. We present a novel model-based DRL approach, the Safe Observations Rewards Actions Costs Learning Ensemble (S-ORACLE), to address these challenges. S-ORACLE balances robust safety awareness with minimized risk and computational efficiency. Empirical validation across complex game environments—Deep RTS, ELF: MiniRTS, MicroRTS, Deep Warehouse, and StarCraft II—demonstrates that S-ORACLE outperforms state-of-the-art methods by significantly improving safety performance, reducing computational costs, and lowering environmental impact, while maintaining high efficiency and adaptability in training.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2024.120980