Fast Iterative model for Sequential-Selection-Based Applications

Accelerated multi-armed bandit (MAB) model in Reinforcement-Learning for on-line sequential selection problems is presented. This iterative model utilizes an automatic step size calculation that improves the performance of MAB algorithm under different conditions such as, variable variance of reward...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer & technology 2014-02, Vol.12 (7), p.3689-3696
Hauptverfasser: Amirizadeh, Khosrow, Mandava, Rajeswari
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accelerated multi-armed bandit (MAB) model in Reinforcement-Learning for on-line sequential selection problems is presented. This iterative model utilizes an automatic step size calculation that improves the performance of MAB algorithm under different conditions such as, variable variance of reward and larger set of usable actions. As result of these modifications, number of optimal selections will be maximized and stability of the algorithm under mentioned conditions may be amplified. This adaptive model with automatic step size computation may attractive for on-line applications in which,  variance of observations vary with time and re-tuning their step size are unavoidable where, this re-tuning is not a simple task. The proposed model governed by upper confidence bound (UCB) approach in iterative form with automatic step size computation. It called adaptive UCB (AUCB) that may use in industrial robotics, autonomous control and intelligent selection or prediction tasks in the economical engineering applications under lack of information.
ISSN:2277-3061
2277-3061
DOI:10.24297/ijct.v12i7.3092