Efficient Reward Functions for Adaptive Multi-rover Systems

This chapter focuses on deriving reward functions that allow multiple agents to co-evolve efficient control policies that maximize a system level reward in noisy and dynamic environments. The solution we present is based on agent rewards satisfying two crucial properties. First, the agent reward fun...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tumer, Kagan, Agogino, Adrian
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Congestion Game Control Policy Exact sciences and technology Multi Layer Perceptron Reward Function Sensitive Reward
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This chapter focuses on deriving reward functions that allow multiple agents to co-evolve efficient control policies that maximize a system level reward in noisy and dynamic environments. The solution we present is based on agent rewards satisfying two crucial properties. First, the agent reward function and global reward function has to be aligned, that is, an agent maximizing its agent-specific reward should also maximize the global reward. Second, the agent has to receive sufficient “signal” from its reward, that is, an agent’s action should have a large influence over its agent-specific reward. Agents using rewards with these two properties will evolve the correct policies quickly. This hypothesis is tested in episodic and non-episodic, continuous-space multi-rover environment where rovers evolve to maximize a global reward function over all rovers. The environments are dynamic (i.e. changes over time), noisy and have restriction on communication between agents. We show that a control policy evolved using agent-specific rewards satisfying the above properties outperforms policies evolved using global rewards by up to 400%. More notably, in the presence of a larger number of rovers or rovers with noisy and communication limited sensors, the proposed method outperforms global reward by a higher percentage than in noise-free conditions with a small number of rovers.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11691839_11