Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning
Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning prob...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many complex problems encountered in both production and daily life can be
conceptualized as combinatorial optimization problems (COPs) over graphs.
Recent years, reinforcement learning (RL) based models have emerged as a
promising direction, which treat the COPs solving as a heuristic learning
problem. However, current finite-horizon-MDP based RL models have inherent
limitations. They are not allowed to explore adquately for improving solutions
at test time, which may be necessary given the complexity of NP-hard
optimization tasks. Some recent attempts solve this issue by focusing on reward
design and state feature engineering, which are tedious and ad-hoc. In this
work, we instead propose a much simpler but more effective technique, named
gauge transformation (GT). The technique is originated from physics, but is
very effective in enabling RL agents to explore to continuously improve the
solutions during test. Morever, GT is very simple, which can be implemented
with less than 10 lines of Python codes, and can be applied to a vast majority
of RL models. Experimentally, we show that traditional RL models with GT
technique produce the state-of-the-art performances on the MaxCut problem.
Furthermore, since GT is independent of any RL models, it can be seamlessly
integrated into various RL frameworks, paving the way of these models for more
effective explorations in the solving of general COPs. |
---|---|
DOI: | 10.48550/arxiv.2404.04661 |