Diagnosing Reinforcement Learning for Traffic Signal Control
With the increasing availability of traffic data and advance of deep reinforcement learning techniques, there is an emerging trend of employing reinforcement learning (RL) for traffic signal control. A key question for applying RL to traffic signal control is how to define the reward and state. The...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the increasing availability of traffic data and advance of deep
reinforcement learning techniques, there is an emerging trend of employing
reinforcement learning (RL) for traffic signal control. A key question for
applying RL to traffic signal control is how to define the reward and state.
The ultimate objective in traffic signal control is to minimize the travel
time, which is difficult to reach directly. Hence, existing studies often
define reward as an ad-hoc weighted linear combination of several traffic
measures. However, there is no guarantee that the travel time will be optimized
with the reward. In addition, recent RL approaches use more complicated state
(e.g., image) in order to describe the full traffic situation. However, none of
the existing studies has discussed whether such a complex state representation
is necessary. This extra complexity may lead to significantly slower learning
process but may not necessarily bring significant performance gain.
In this paper, we propose to re-examine the RL approaches through the lens of
classic transportation theory. We ask the following questions: (1) How should
we design the reward so that one can guarantee to minimize the travel time? (2)
How to design a state representation which is concise yet sufficient to obtain
the optimal solution? Our proposed method LIT is theoretically supported by the
classic traffic signal control methods in transportation field. LIT has a very
simple state and reward design, thus can serve as a building block for future
RL approaches to traffic signal control. Extensive experiments on both
synthetic and real datasets show that our method significantly outperforms the
state-of-the-art traffic signal control methods. |
---|---|
DOI: | 10.48550/arxiv.1905.04716 |