An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand

•A novel model of loop metro service driven by stochastic demand of general distribution.•A Markov decision process for optimal schedule with circulation of limited rolling stock.•An actor-critic deep reinforcement learning framework with an off-policy training algorithm.•A case study demonstrating...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transportation research. Part B: methodological 2020-10, Vol.140, p.210-235
Hauptverfasser: Ying, Cheng-shuo, Chow, Andy H.F., Chin, Kwai-Sang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel model of loop metro service driven by stochastic demand of general distribution.•A Markov decision process for optimal schedule with circulation of limited rolling stock.•An actor-critic deep reinforcement learning framework with an off-policy training algorithm.•A case study demonstrating the efficiency and robustness of the proposed control system. This paper presents a novel actor-critic deep reinforcement learning approach for metro train scheduling with circulation of limited rolling stock. The scheduling problem is modeled as a Markov decision process driven by stochastic passenger demand. As in most dynamic optimization problems, the complexity of the scheduling process grows exponentially with the amount of states, decisions, and uncertainties involved. This study aims to address this ‘curses of dimensionality’ issue by adopting an actor-critic deep reinforcement learning solution framework. The framework simplifies the evaluation and searching process for potential optimal solutions by parameterizing the original state and decision spaces with the use of artificial neural networks. A deep deterministic policy gradient algorithm is developed for training the artificial neural networks via simulated system transitions before the actor-critic agent can be applied for online schedule control. The proposed approach is tested with a real-world scenario configured with data collected from the Victoria Line of London Underground, UK. Experiment results illustrate the advantages of the proposed method over a range of established meta-heuristics in terms of computing time, system efficiency, and robustness under different stochastic environments. This study innovates urban transit operations with state-of-the-art computer science and dynamic optimization techniques.
ISSN:0191-2615
1879-2367
DOI:10.1016/j.trb.2020.08.005