Improved learning efficiency of deep Monte-Carlo for complex imperfect-information card games
Deep Reinforcement Learning (DRL) has achieved considerable success in games involving perfect and imperfect information, such as Go, Texas Hold’em, Stratego, and DouDiZhu. Nevertheless, training a state-of-the-art model for complex imperfect-information card games like DouDiZhu and Big2 remains res...
Gespeichert in:
Veröffentlicht in: | Applied soft computing 2024-06, Vol.158, p.111545, Article 111545 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep Reinforcement Learning (DRL) has achieved considerable success in games involving perfect and imperfect information, such as Go, Texas Hold’em, Stratego, and DouDiZhu. Nevertheless, training a state-of-the-art model for complex imperfect-information card games like DouDiZhu and Big2 remains resource and time-intensive. To address this challenge, this paper introduces two innovative methods: the Opponent Model and Optimized Deep Monte-Carlo (ODMC). These methods are designed to improve the training efficiency of Deep Monte-Carlo (DMC) for imperfect-information card games. The Opponent Model predicts hidden information, enhancing the agent’s learning speed in DMC compared to the original training that only utilizes observed information as input features. In ODMC, the Minimum Combination Search (MCS) is a heuristic search algorithm based on dynamic programming. It calculates the minimum combination of actions in the current state, and ODMC uses MCS to filter suboptimal actions in each state. This reduces the action space considered by DMC, resulting in faster training that focuses on evaluating the most promising actions. The effectiveness of the proposed approach is evaluated by examining two complex card games with imperfect information: DouDiZhu and Big2. Ablation experiments are conducted to evaluate both the Opponent Model (D+OM and B+OM) and ODMC (D+ODMC and B+ODMC), along with their combined variants (D+OMODMC and B+OMODMC). Furthermore, D+OMODMC and B+OMODMC are compared with state-of-the-art DouDiZhu and Big2 artificial intelligence (AI) programs, respectively. The experimental results demonstrate that the proposed methods achieve comparable performance to the original DMC, but with only 25.5% of the training time on the same device. These findings are valuable for mitigating both the equipment requirements and training time in complex imperfect-information card games.
•Propose a novel Opponent Model and ODMC to improve the learning efficiency of DMC.•Opponent Model predicts hidden information, enhancing learning.•ODMC uses MCS to filter suboptimal actions, speeding up training.•Proposed methods achieve comparable performance with reduced training time. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2024.111545 |