GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents
Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Currently, traffic signal control (TSC) methods based on reinforcement
learning (RL) have proven superior to traditional methods. However, most RL
methods face difficulties when applied in the real world due to three factors:
input, output, and the cycle-flow relation. The industry's observable input is
much more limited than simulation-based RL methods. For real-world solutions,
only flow can be reliably collected, whereas common RL methods need more. For
the output action, most RL methods focus on acyclic control, which real-world
signal controllers do not support. Most importantly, industry standards require
a consistent cycle-flow relationship: non-decreasing and different response
strategies for low, medium, and high-level flows, which is ignored by the RL
methods. To narrow the gap between RL methods and industry standards, we
innovatively propose to use industry solutions to guide the RL agent.
Specifically, we design behavior cloning and curriculum learning to guide the
agent to mimic and meet industry requirements and, at the same time, leverage
the power of exploration and exploitation in RL for better performance. We
theoretically prove that such guidance can largely decrease the sample
complexity to polynomials in the horizon when searching for an optimal policy.
Our rigid experiments show that our method has good cycle-flow relation and
superior performance. |
---|---|
DOI: | 10.48550/arxiv.2407.10811 |