PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search
This letter presents a new range of multi-robot search for a non-adversarial moving target problems, namely multi-robot reliable search (MuRRS). The term 'reliability' in MuRRS is defined as the expectation of a predefined utility function over the probability density function (PDF) of the...
Gespeichert in:
Veröffentlicht in: | IEEE robotics and automation letters 2022-10, Vol.7 (4), p.8869-8876 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This letter presents a new range of multi-robot search for a non-adversarial moving target problems, namely multi-robot reliable search (MuRRS). The term 'reliability' in MuRRS is defined as the expectation of a predefined utility function over the probability density function (PDF) of the target's capture time. We argue that MuRRS subsumes the canonical multi-robot efficient search (MuRES) problem, which minimizes the target's expected capture time, as its special case, and offers the end user with a wide range of objective selection options. Since state-of-the-art algorithms are usually targeting the MuRES problem, and cannot offer up-to-standard performance to the various MuRRS objectives, we, thereby, propose a probability density factorized multi-agent distributional reinforcement learning method, namely PD-FAC, as a unified solution to the MuRRS problem. PD-FAC decomposes the PDF of the multi-robot system's overall value distribution into a set of individual value distributions and guarantees that any reliability objective defined as a function of the overall system's value distribution can be linearly approximated by the same reliability metric defined over the agent's individual value distribution. In this way, the individual global maximum (IGM) principle is satisfied for all the pre-defined reliability metrics. It means that when each reinforcement learning agent is executing the individual policy, which maximizes its own reliability metric, the system's overall reliability performance is also maximized. We evaluate and compare the performance of PD-FAC with state of the arts in a range of canonical multi-robot search environments with satisfying results, and also deploy PD-FAC to a real multi-robot system for non-adversarial moving target search. |
---|---|
ISSN: | 2377-3766 2377-3766 |
DOI: | 10.1109/LRA.2022.3188904 |