Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning

The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Security and communication networks 2021, Vol.2021, p.1-7
Hauptverfasser:	Dai, Wei, Wang, Wei, Mao, Zhongtian, Jiang, Ruwen, Nian, Fudong, Li, Teng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Eigenvalues Expected values Lagrange multiplier Learning Multiagent systems Optimization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constraint terms. In this problem, all agents share the space of states and actions, but each agent only obtains its own local reward. Then, we propose a distributed optimization with fractional order dynamics to solve this problem. Moreover, we prove the convergence of the proposed algorithm and illustrate its effectiveness with a numerical example.
ISSN:	1939-0114 1939-0122
DOI:	10.1155/2021/1020466