Solving biobjective traveling thief problems with multiobjective reinforcement learning

This study proposes an end-to-end multiobjective reinforcement learning (MORL) approach to solve the biobjective traveling thief problems (TTP). A TTP involves a thief visiting cities and selecting items to maximize profit while minimizing travel time within a knapsack’s capacity. The study evaluate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied soft computing 2024-08, Vol.161, p.111751, Article 111751
Hauptverfasser:	Santiyuda, Gemilang, Wardoyo, Retantyo, Pulungan, Reza
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Deep reinforcement learning Multiobjective Pointer network Traveling thief problem
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This study proposes an end-to-end multiobjective reinforcement learning (MORL) approach to solve the biobjective traveling thief problems (TTP). A TTP involves a thief visiting cities and selecting items to maximize profit while minimizing travel time within a knapsack’s capacity. The study evaluates combinations of two architectures, namely the pointer network (PN) and attention mechanism (AM), with three MORL methods: deep reinforcement learning multiobjective algorithm (DRLMOA), multi-sample Pareto hypernetwork (PHN), and manifold-based policy search (MBPS). However, PN and AM cannot be directly used to predict two different sequences simultaneously: the city tour and the item selection. Therefore, a solution encoding and decoding scheme is proposed to solve TTP without substantially modifying PN and AM. The methods are trained on only small randomly generated problem instances based on Eil76 instances, and their performance is evaluated on various problem instances. The state-of-the-art non-dominated sorting-based customized random-key genetic algorithm (NDS-BRKGA) serves as the baseline. The experimental study demonstrates a competitive performance of the proposed methods compared to the baseline, particularly in instances with a high number of items. The proposed methods, especially PN-DRLMOA and AM-DRLMOA, also show promising generalization capabilities on different and larger graphs. Lastly, the proposed MORL methods significantly outperform NDS-BRKGA in terms of the solution generation running time. •A multiobjective RL approach is proposed for the biobjective traveling thief problem.•Combinations of three multiobjective RL methods and two architectures are proposed.•An analysis of the preference-based multiobjective RL methods is conducted.•The methods show promising performance and generalizability compared to NDS-BRKGA.•The proposed methods are compared based on solution quality and training time.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2024.111751