PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search

This letter presents a new range of multi-robot search for a non-adversarial moving target problems, namely multi-robot reliable search (MuRRS). The term 'reliability' in MuRRS is defined as the expectation of a predefined utility function over the probability density function (PDF) of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters 2022-10, Vol.7 (4), p.8869-8876
Hauptverfasser: Sheng, Wenda, Guo, Hongliang, Yau, Wei-Yun, Zhou, Yingjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This letter presents a new range of multi-robot search for a non-adversarial moving target problems, namely multi-robot reliable search (MuRRS). The term 'reliability' in MuRRS is defined as the expectation of a predefined utility function over the probability density function (PDF) of the target's capture time. We argue that MuRRS subsumes the canonical multi-robot efficient search (MuRES) problem, which minimizes the target's expected capture time, as its special case, and offers the end user with a wide range of objective selection options. Since state-of-the-art algorithms are usually targeting the MuRES problem, and cannot offer up-to-standard performance to the various MuRRS objectives, we, thereby, propose a probability density factorized multi-agent distributional reinforcement learning method, namely PD-FAC, as a unified solution to the MuRRS problem. PD-FAC decomposes the PDF of the multi-robot system's overall value distribution into a set of individual value distributions and guarantees that any reliability objective defined as a function of the overall system's value distribution can be linearly approximated by the same reliability metric defined over the agent's individual value distribution. In this way, the individual global maximum (IGM) principle is satisfied for all the pre-defined reliability metrics. It means that when each reinforcement learning agent is executing the individual policy, which maximizes its own reliability metric, the system's overall reliability performance is also maximized. We evaluate and compare the performance of PD-FAC with state of the arts in a range of canonical multi-robot search environments with satisfying results, and also deploy PD-FAC to a real multi-robot system for non-adversarial moving target search.
ISSN:2377-3766
2377-3766
DOI:10.1109/LRA.2022.3188904