Learning From Oracle Demonstrations-A New Approach to Develop Autonomous Intersection Management Control Algorithms Based on Multiagent Deep Reinforcement Learning
Worldwide, many companies are working towards safe and innovative control systems for Autonomous Vehicles (AVs). A key component is Autonomous Intersection Management (AIM) systems, which operate at the level of traffic intersections and manage the right-of-way for AVs, thereby improving flow and sa...
Gespeichert in:
Veröffentlicht in: | IEEE access 2022, Vol.10, p.53601-53613 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Worldwide, many companies are working towards safe and innovative control systems for Autonomous Vehicles (AVs). A key component is Autonomous Intersection Management (AIM) systems, which operate at the level of traffic intersections and manage the right-of-way for AVs, thereby improving flow and safety. AIM traditionally uses control policies based on simple rules. However, Deep Reinforcement Learning (DRL) can provide advanced control policies with the advantage of proactively reacting and forecasting hazardous situations. The main drawback of DRL is the training time, which is fast in simple tasks but not negligible when addressing real-world problems with multiple agents. Learning from Demonstrations (LfD) emerged to solve this problem, significantly speeding up training, and reducing the exploration problem. The challenge is that LfD requires an expert to extract new demonstrations. Therefore, in this paper, we propose the use of an agent, previously trained by imitation learning, to act as an expert to leverage LfD. We named this new agent Oracle , and our new approach was called Learning from Oracle Demonstrations (LfOD). We implemented this novel method over the DRL TD3 algorithm, incorporating significant changes to TD3 that allowed the use of Oracle demonstrations. The complete version was called TD3fOD. The results obtained in the AIM training scenario showed that TD3fOD notably improves the learning process compared with TD3 and DDPGfD, speeding up learning to 5-6 times, while the policy found offered both significantly lower variance and better learning ability. The testing scenario also showed a significant improvement in multiple key performance metrics compared with other vehicle control techniques on AIM, such as reducing waiting time by more than 90% and significantly decreasing fuel or electricity consumption and emissions, highlighting the benefits of LfOD. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2022.3175493 |