A comparison between UCB and UCB-Tuned as selection policies in GGP

In this paper, we present a comparative analysis of two selection policies in the General Game Playing (GGP) context: Upper Confidence Bound (UCB) and Upper Confidence Bound Tuned (UCB-Tuned). The aim of the analysis is to identify which policy has the best performance in terms of victories in the G...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of intelligent & fuzzy systems 2019-01, Vol.36 (5), p.5073-5079
Hauptverfasser:	Francisco-Valencia, Iván, Marcial-Romero, José Raymundo, Valdovinos-Rosas, Rosa María
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer simulation Policies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we present a comparative analysis of two selection policies in the General Game Playing (GGP) context: Upper Confidence Bound (UCB) and Upper Confidence Bound Tuned (UCB-Tuned). The aim of the analysis is to identify which policy has the best performance in terms of victories in the GGP domain, a measure used in most of literature with other policies. In order to carry out the comparison, two agents were programmed using the GGP-base framework and the Monte Carlo Tree Search (MCTS) method. The games Breakthrough, Knightthrough and Connect Four were used as experimental scenarios, not compared previously to the best of our knowledge. The results show that UCB-Tuned is better when less than 100 simulations are used in MCTS; however, when 1000 simulations are used, both policies have similar performance.
ISSN:	1064-1246 1875-8967
DOI:	10.3233/JIFS-179052