A deep reinforcement learning approach for solving the Traveling Salesman Problem with Drone

Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transportation research. Part C, Emerging technologies Emerging technologies, 2023-03, Vol.148, p.103981, Article 103981
Hauptverfasser: Bogyrbayeva, Aigerim, Yoon, Taehyun, Ko, Hanbum, Lim, Sungbin, Yun, Hyokun, Kwon, Changhyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination—a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder’s hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods. •We solve the Traveling Salesman Problem with Drone using reinforcement learning.•We present the problem as a Markov Decision Process.•We propose an attention encoder-LSTM decoder hybrid model.•We devise the Distribute-and-Follow Policy Gradient training algorithm.•The proposed model demonstrates comparable results as the operations research baseline methods.
ISSN:0968-090X
1879-2359
DOI:10.1016/j.trc.2022.103981