An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching

Balancing the supply and demand for ride-sourcing companies is a challenging issue, especially with real-time requests and stochastic traffic conditions of large-scale congested road networks. To tackle this challenge, this article proposes a robust and scalable approach that integrates reinforcemen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2022-09, Vol.33 (9), p.4742-4756
Hauptverfasser:	Liang, Enming, Wen, Kexin, Lam, William H. K., Sumalee, Agachai, Zhong, Renxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Car sharing Decisions Deep reinforcement learning (RL) Dispatching Driving conditions Machine learning Markov processes multiagent system online vehicle routing Programming Public transportation Real time operation Real-time systems Reinforcement Relocation Roads Simulation stochastic network traffic Stochasticity Taxicabs Traffic congestion Traffic planning vehicle dispatching Vehicle dynamics Vehicles
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Balancing the supply and demand for ride-sourcing companies is a challenging issue, especially with real-time requests and stochastic traffic conditions of large-scale congested road networks. To tackle this challenge, this article proposes a robust and scalable approach that integrates reinforcement learning (RL) and a centralized programming (CP) structure to promote real-time taxi operations. Both real-time order matching decisions and vehicle relocation decisions at the microscopic network scale are integrated within a Markov decision process framework. The RL component learns the decomposed state-value function, which represents the taxi drivers' experience, the off-line historical demand pattern, and the traffic network congestion. The CP component plans nonmyopic decisions for drivers collectively under the prescribed system constraints to explicitly realize cooperation. Furthermore, to circumvent sparse reward and sample imbalance problems over the microscopic road network, this article proposed a temporal-difference learning algorithm with prioritized gradient descent and adaptive exploration techniques. A simulator is built and trained with the Manhattan road network and New York City yellow taxi data to simulate the real-time vehicle dispatching environment. Both centralized and decentralized taxi dispatching policies are examined with the simulator. This case study shows that the proposed approach can further improve taxi drivers' profits while reducing customers' waiting times compared to several existing vehicle dispatching algorithms.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2021.3060187