Reinforcement Learning-Based Multiaccess Control and Battery Prediction With Energy Harvesting in IoT Systems

Energy harvesting (EH) is a promising technique to fulfill the long-term and self-sustainable operations for Internet of Things (IoT) systems. In this paper, we study the joint access control and battery prediction problems in a small-cell IoT system including multiple EH user equipments (UEs) and o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE internet of things journal 2019-04, Vol.6 (2), p.2009-2020
Hauptverfasser: Chu, Man, Li, Hang, Liao, Xuewen, Cui, Shuguang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Energy harvesting (EH) is a promising technique to fulfill the long-term and self-sustainable operations for Internet of Things (IoT) systems. In this paper, we study the joint access control and battery prediction problems in a small-cell IoT system including multiple EH user equipments (UEs) and one base station (BS) with limited uplink access channels. Each UE has a rechargeable battery with finite capacity. The system control is modeled as a Markov decision process without complete prior knowledge assumed at the BS, which also deals with large sizes in both state and action spaces. First, to handle the access control problem assuming causal battery and channel state information, we propose a scheduling algorithm that maximizes the uplink transmission sum rate based on reinforcement learning (RL) with deep {Q} -network enhancement. Second, for the battery prediction problem, with a fixed round-robin access control policy adopted, we develop an RL-based algorithm to minimize the prediction loss (error) without any model knowledge about the energy source and energy arrival process. Finally, the joint access control and battery prediction problem is investigated, where we propose a two-layer RL network to simultaneously deal with maximizing the sum rate and minimizing the prediction loss: the first layer is for battery prediction, the second layer generates the access policy based on the output from the first layer. Experiment results show that the three proposed RL algorithms can achieve better performances compared with existing benchmarks.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2018.2872440