Online food ordering delivery strategies based on deep reinforcement learning

With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-04, Vol.52 (6), p.6853-6865
Hauptverfasser: Zou, Guangyu, Tang, Jiafu, Yilmaz, Levent, Kong, Xiangyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast volume of transactions and the computational complexity of delivery routes pose significant challenges to the efficient dispatching of orders. The action to dispatch orders and the resulting state transition of couriers form a Markov decision process (MDP). The reinforcement learning technique had proven its capability of dealing with MDP. This paper proposes a Double Deep Q Netwok (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-021-02750-3