A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone

Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bogyrbayeva, Aigerim, Yoon, Taehyun, Ko, Hanbum, Lim, Sungbin, Yun, Hyokun, Kwon, Changhyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Bogyrbayeva, Aigerim
Yoon, Taehyun
Ko, Hanbum
Lim, Sungbin
Yun, Hyokun
Kwon, Changhyun
description Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.
doi_str_mv 10.48550/arxiv.2112.12545
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_12545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_12545</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-4bdad6f8cc72aaf1fb5cda811eba7de72215401f2be05e76158d739cb2a4bde13</originalsourceid><addsrcrecordid>eNotj09PhDAUxHvxYFY_gCf7BUBeoRSPZNd_CYnG5bgJeS2vCwm0pEtQv73L6mkyM5lJfozdQRJnhZTJA4bvfokFgIhByExes0PJd0QT_6TeWR8MjeRmXhEG17sjL6cpeDQdP3d874dlDeeOeB1woWF1exzoNKLjH8HrgUb-1c8d3wXv6IZdWRxOdPuvG1Y_P9Xb16h6f3nbllWEuZJRpltsc1sYowSiBaulabEAII2qJSUEyCwBKzQlklQOsmhV-mi0wPOUIN2w-7_bC14zhX7E8NOsmM0FM_0Fg1NOyA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone</title><source>arXiv.org</source><creator>Bogyrbayeva, Aigerim ; Yoon, Taehyun ; Ko, Hanbum ; Lim, Sungbin ; Yun, Hyokun ; Kwon, Changhyun</creator><creatorcontrib>Bogyrbayeva, Aigerim ; Yoon, Taehyun ; Ko, Hanbum ; Lim, Sungbin ; Yun, Hyokun ; Kwon, Changhyun</creatorcontrib><description>Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.</description><identifier>DOI: 10.48550/arxiv.2112.12545</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Mathematics - Optimization and Control</subject><creationdate>2021-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.12545$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.12545$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bogyrbayeva, Aigerim</creatorcontrib><creatorcontrib>Yoon, Taehyun</creatorcontrib><creatorcontrib>Ko, Hanbum</creatorcontrib><creatorcontrib>Lim, Sungbin</creatorcontrib><creatorcontrib>Yun, Hyokun</creatorcontrib><creatorcontrib>Kwon, Changhyun</creatorcontrib><title>A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone</title><description>Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj09PhDAUxHvxYFY_gCf7BUBeoRSPZNd_CYnG5bgJeS2vCwm0pEtQv73L6mkyM5lJfozdQRJnhZTJA4bvfokFgIhByExes0PJd0QT_6TeWR8MjeRmXhEG17sjL6cpeDQdP3d874dlDeeOeB1woWF1exzoNKLjH8HrgUb-1c8d3wXv6IZdWRxOdPuvG1Y_P9Xb16h6f3nbllWEuZJRpltsc1sYowSiBaulabEAII2qJSUEyCwBKzQlklQOsmhV-mi0wPOUIN2w-7_bC14zhX7E8NOsmM0FM_0Fg1NOyA</recordid><startdate>20211221</startdate><enddate>20211221</enddate><creator>Bogyrbayeva, Aigerim</creator><creator>Yoon, Taehyun</creator><creator>Ko, Hanbum</creator><creator>Lim, Sungbin</creator><creator>Yun, Hyokun</creator><creator>Kwon, Changhyun</creator><scope>AKY</scope><scope>AKZ</scope><scope>GOX</scope></search><sort><creationdate>20211221</creationdate><title>A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone</title><author>Bogyrbayeva, Aigerim ; Yoon, Taehyun ; Ko, Hanbum ; Lim, Sungbin ; Yun, Hyokun ; Kwon, Changhyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-4bdad6f8cc72aaf1fb5cda811eba7de72215401f2be05e76158d739cb2a4bde13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><toplevel>online_resources</toplevel><creatorcontrib>Bogyrbayeva, Aigerim</creatorcontrib><creatorcontrib>Yoon, Taehyun</creatorcontrib><creatorcontrib>Ko, Hanbum</creatorcontrib><creatorcontrib>Lim, Sungbin</creatorcontrib><creatorcontrib>Yun, Hyokun</creatorcontrib><creatorcontrib>Kwon, Changhyun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bogyrbayeva, Aigerim</au><au>Yoon, Taehyun</au><au>Ko, Hanbum</au><au>Lim, Sungbin</au><au>Yun, Hyokun</au><au>Kwon, Changhyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone</atitle><date>2021-12-21</date><risdate>2021</risdate><abstract>Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.</abstract><doi>10.48550/arxiv.2112.12545</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2112.12545
ispartof
issn
language eng
recordid cdi_arxiv_primary_2112_12545
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
Mathematics - Optimization and Control
title A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T21%3A51%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Deep%20Reinforcement%20Learning%20Approach%20for%20Solving%20the%20Traveling%20Salesman%20Problem%20with%20Drone&rft.au=Bogyrbayeva,%20Aigerim&rft.date=2021-12-21&rft_id=info:doi/10.48550/arxiv.2112.12545&rft_dat=%3Carxiv_GOX%3E2112_12545%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true