Automated design of action advising trigger conditions for multiagent reinforcement learning: A genetic programming-based approach

Action advising is a popular and effective approach to accelerating independent multiagent reinforcement learning (MARL), especially for the learning systems that all the agents learn from scratch and the roles of them (advisors or advisees) cannot be predefined. The key component of action advising...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Swarm and evolutionary computation 2024-03, Vol.85, p.101475, Article 101475
Hauptverfasser: Wang, Tonghao, Peng, Xingguang, Wang, Tao, Liu, Tong, Xu, Demin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Action advising is a popular and effective approach to accelerating independent multiagent reinforcement learning (MARL), especially for the learning systems that all the agents learn from scratch and the roles of them (advisors or advisees) cannot be predefined. The key component of action advising is the trigger condition, which answers the question of when to advise. Previous works mainly focus on the design of novel trigger conditions manually; however, since those conditions are often designed heuristically, the performance may be affected by the preference of the designers. To this end, this paper tries to solve the action advising problem automatically using genetic programming (GP), an evolutionary computation technique. A framework incorporating GP to action advising is provided, together with a novel population initialization method to enhance the performance. Empirical studies are provided to demonstrate the effectiveness of the proposed framework. More importantly, thanks to the high transparency of GP, comprehensive analysis is also conducted based on the results. Interesting and inspiring insights to the action advising problem are condensed from the discussions, which may provide guidance to future works.
ISSN:2210-6502
DOI:10.1016/j.swevo.2024.101475