FRL-FI: Transient Fault Analysis for Federated Reinforcement Learning-Based Navigation Systems
Swarm intelligence is being increasingly deployed in autonomous systems, such as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key swarm intelligence paradigm where agents interact with their own environments and cooperatively learn a consensus policy while preserving priva...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Swarm intelligence is being increasingly deployed in autonomous systems, such
as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key
swarm intelligence paradigm where agents interact with their own environments
and cooperatively learn a consensus policy while preserving privacy, has
recently shown potential advantages and gained popularity. However, transient
faults are increasing in the hardware system with continuous technology node
scaling and can pose threats to FRL systems. Meanwhile, conventional
redundancy-based protection methods are challenging to deploy on
resource-constrained edge applications. In this paper, we experimentally
evaluate the fault tolerance of FRL navigation systems at various scales with
respect to fault models, fault locations, learning algorithms, layer types,
communication intervals, and data types at both training and inference stages.
We further propose two cost-effective fault detection and recovery techniques
that can achieve up to 3.3x improvement in resilience with |
---|---|
DOI: | 10.48550/arxiv.2203.07276 |