Training a Gaming Agent on Brainwaves

Error-related potentials (ErrPs) are a particular type of event-related potential elicited by a person attending a recognizable error. These electroencephalographic signals can be used to train a gaming agent by a reinforcement learning algorithm to learn an optimal policy. The experimental process...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on games 2022-03, Vol.14 (1), p.85-92
Hauptverfasser:	Francisco, Bartolome, Juan, Moreno, Natalia, Navas, Jose, Vitali, Rodrigo, Ramele, Miguel, Santos Juan
Format:	Artikel
Sprache:	eng
Schlagworte:	Agent Algorithms artificial intelligence (AI) brain–computer interface (BCI) Electrodes electroencephalographic (EEG) Electroencephalography error-related potential (ErrP) Errors Games Machine learning Reinforcement learning reinforcement learning (RL) Servers Signal processing algorithms Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Error-related potentials (ErrPs) are a particular type of event-related potential elicited by a person attending a recognizable error. These electroencephalographic signals can be used to train a gaming agent by a reinforcement learning algorithm to learn an optimal policy. The experimental process consists of an observational human critic (OHC) observing a simple game scenario while their brain signals are captured. The game consists of a grid, where a blue spot has to reach a desired target in the fewest amount of steps. Results show that there is an effective transfer of information and that the agent successfully learns to solve the game efficiently, from the initial 97 steps on average required to reach the target to the optimal number of eight steps. Our results are expressed in threefold: the mechanics of a simple grid-based game that can elicit the ErrP signal component; the verification that the reward function only penalizes wrong steps, which means that type II error (not properly identifying a wrong movement) does not affect significantly the agent learning process; collaborative rewards from multiple OHCs can be used to train the algorithm effectively and can compensate low classification accuracies and a limited scope of transfer learning schemes.
ISSN:	2475-1502 2475-1510
DOI:	10.1109/TG.2020.3042900