Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization
Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Combinatorial optimization has found applications in numerous fields, from
aerospace to transportation planning and economics. The goal is to find an
optimal solution among a finite set of possibilities. The well-known challenge
one faces with combinatorial optimization is the state-space explosion problem:
the number of possibilities grows exponentially with the problem size, which
makes solving intractable for large problems. In the last years, deep
reinforcement learning (DRL) has shown its promise for designing good
heuristics dedicated to solve NP-hard combinatorial optimization problems.
However, current approaches have two shortcomings: (1) they mainly focus on the
standard travelling salesman problem and they cannot be easily extended to
other problems, and (2) they only provide an approximate solution with no
systematic ways to improve it or to prove optimality. In another context,
constraint programming (CP) is a generic tool to solve combinatorial
optimization problems. Based on a complete search procedure, it will always
find the optimal solution if we allow an execution time large enough. A
critical design choice, that makes CP non-trivial to use in practice, is the
branching decision, directing how the search space is explored. In this work,
we propose a general and hybrid approach, based on DRL and CP, for solving
combinatorial optimization problems. The core of our approach is based on a
dynamic programming formulation, that acts as a bridge between both techniques.
We experimentally show that our solver is efficient to solve two challenging
problems: the traveling salesman problem with time windows, and the 4-moments
portfolio optimization problem. Results obtained show that the framework
introduced outperforms the stand-alone RL and CP solutions, while being
competitive with industrial solvers. |
---|---|
DOI: | 10.48550/arxiv.2006.01610 |