Deep reinforcement learning with credit assignment for combinatorial optimization
•Deep Reinforcement Learning is efficient in solving some combinatorial optimization problems.•Credit assignment can be used to reduce the high sample complexity of Deep Reinforcement Learning algorithms.•Model-free and model-based reinforcement learning algorithms can be connected to solve large-sc...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2022-04, Vol.124, p.108466, Article 108466 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Deep Reinforcement Learning is efficient in solving some combinatorial optimization problems.•Credit assignment can be used to reduce the high sample complexity of Deep Reinforcement Learning algorithms.•Model-free and model-based reinforcement learning algorithms can be connected to solve large-scale problems.•Assign credits for hundreds of thousands of state-action pairs in a systemic manner will accelerate the training process.
Recent advances in Deep Reinforcement Learning (DRL) demonstrates the potential for solving Combinatorial Optimization (CO) problems. DRL shows advantages over traditional methods both on scalability and computation efficiency. However, the DRL problems transformed from CO problems usually have a huge state space, and the main challenge of solving them has changed from high computation complexity to high sample complexity.
Credit assignment determines the contribution of each internal decision to the final success or failure, and it has been shown to be effective in reducing the sample complexity of the training process. In this paper, we resort to a model-based reinforcement learning method to assign credits for model-free DRL methods. Since heuristic methods plays an important role on state-of-the-art solutions for CO problems, we propose using a model to represent those heuristic knowledge and derive the credit assignment from the model. This model-based credit assignment can facilitate the model-free DRL to perform a more effective exploration, and the data collected by the model-free DRL refines the model continuously as the training progresses. Extensive experiments on various CO problems with different settings show that our framework outperforms previous state-of-the-art methods on performance and training efficiency. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2021.108466 |