On the effect of clock offsets and quantization on learning-based adversarial games

In this work, we consider systems whose components suffer from clock offsets and quantization and study the effect of those on a reinforcement learning (RL) algorithm. Specifically, we consider an off-policy iterative RL algorithm for continuous-time systems, which uses input and state data to appro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automatica (Oxford) 2024-09, Vol.167, p.111762, Article 111762
Hauptverfasser:	Fotiadis, Filippos, Kanellopoulos, Aris, Vamvoudakis, Kyriakos G., Hugues, Jerome
Format:	Artikel
Sprache:	eng
Schlagworte:	Clock offsets Learning Quantization Zero-sum games
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we consider systems whose components suffer from clock offsets and quantization and study the effect of those on a reinforcement learning (RL) algorithm. Specifically, we consider an off-policy iterative RL algorithm for continuous-time systems, which uses input and state data to approximate the Nash-equilibrium of a zero-sum game. However, the data used by this algorithm are not consistent with one another, in that each of them originates from a slightly different time instant of the past, hence putting the convergence of the algorithm in question. We prove that, given that these timing inconsistencies remain below a certain threshold, the iterative off-policy RL algorithm will still converge epsilon-closely to the desired Nash policy. However, this result is conditional to a certain Lipschitz continuity and differentiability condition on the input-state data collected, which is indispensable in the presence of clock offsets. A similar result is also derived when quantization of the measured state is considered. Finally, unlike prior work, we provide a sufficiently rich data condition for the execution of the iterative RL algorithm, which can be verified a priori across all iteration indices. Simulations are performed, which verify and clarify theoretical findings.
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2024.111762