Repot: Transferable Reinforcement Learning for Quality-Centric Networked Monitoring in Various Environments
Collecting and monitoring data in low-latency from numerous sensing devices is one of the key foundations in networked cyber-physical applications such as industrial process control, intelligent traffic control, and networked robots. As the delay in data updates can degrade the quality of networked...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.147280-147294 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Collecting and monitoring data in low-latency from numerous sensing devices is one of the key foundations in networked cyber-physical applications such as industrial process control, intelligent traffic control, and networked robots. As the delay in data updates can degrade the quality of networked monitoring, it is desirable to continuously maintain the optimal setting on sensing devices in terms of transmission rates and bandwidth allocation, taking into account application requirements as well as time-varying conditions of underlying network environments. In this paper, we adapt deep reinforcement learning (RL) to achieve a bandwidth allocation policy in networked monitoring. We present a transferable RL model Repot in which a policy trained in an easy-to-learn network environment can be readily adjusted in various target network environments. Specifically, we employ flow embedding and action shaping schemes in Repot that enable the systematic adaptation of a bandwidth allocation policy to the conditions of a target environment. Through experiments with the NS-3 network simulator, we show that Repot achieves stable and high monitoring performance across different network conditions, e.g., outperforming other heuristics and learning-based solutions by 14.5~20.8% in quality-of-experience (QoE) for a target network environment. We also demonstrate the sample-efficient adaptation in Repot by exploiting only 6.25% of the sample amount required for model training from scratch. We present a case study with the SUMO mobility simulator and verify the benefits of Repot in practical scenarios, showing performance gains over the others, e.g., 6.5% in urban-scale and 12.6% in suburb-scale. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3125008 |