Order dispatching for an ultra-fast delivery service via deep reinforcement learning

This paper proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. Before applying off-the-shelf reinforcement learning methods, we define the specific problem at Getir and one of the solutions the comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-03, Vol.52 (4), p.4274-4299
Hauptverfasser: Kavuk, Eray Mert, Tosun, Ayse, Cevik, Mucahit, Bozanta, Aysun, Sonuç, Sibel B., Tutuncu, Mehmetcan, Kosucu, Bilgin, Basar, Ayse
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. Before applying off-the-shelf reinforcement learning methods, we define the specific problem at Getir and one of the solutions the company has implemented. We discuss the novel aspects of Getir’s problem compared to the state-of-the-art order dispatching studies and highlight the limitations of Getir’s solution. The overall aim of the company is to deliver to as many customers as possible within 10 minutes. The orders arrive throughout the day, and centralized warehouses in the regions decide whether an incoming order should be served or canceled depending on their couriers’ shifts and status. We use Deep Q-networks to learn the actions of warehouses, i.e., accepting or canceling an order, directly from state dimensions using reinforcement learning. We design the networks with two different rewards. We conduct empirical analyses using real-life data provided by Getir to generate training samples and to assess the models’ performance during a selected 30-day period with a total of 9880 orders. The results indicate that our proposed models are able to generate policies that outperform the rule-based heuristic employed in practice.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-021-02610-0