Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-07
Hauptverfasser:	Ghoorchian, Saeed, Maghsudi, Setareh
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Combinatorial analysis Decision making Decision theory Feedback Graph theory Learning Linear functions Multivariate statistical analysis Nonstationary environments Numerical analysis Optimization Performance degradation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!