Crossmodal attentive skill learner: learning in Atari and beyond with audio–video inputs

This paper introduces the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic architecture [Harb et al. in When waiting is not an option: learning options with a deliberation cost. arXiv preprint arXiv:1709.04571 , 2017] to enable h...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Autonomous agents and multi-agent systems 2020-04, Vol.34 (1), Article 16
Hauptverfasser:	Kim, Dong-Ki, Omidshafiei, Shayegan, Pazis, Jason, How, Jonathan P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Automation & Control Systems Computer Science Computer Science, Artificial Intelligence Computer Systems Organization and Communication Networks Learning Performance enhancement Science & Technology Software Engineering/Programming and Operating Systems Technology User Interfaces and Human Computer Interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic architecture [Harb et al. in When waiting is not an option: learning options with a deliberation cost. arXiv preprint arXiv:1709.04571 , 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. Agents trained using our approach learn to attend to their various sensory modalities (e.g., audio, video) at the appropriate moments, thereby executing actions based on multiple sensory streams without reliance on supervisory data. We demonstrate empirically that the sensory attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. Further, we provide concrete examples in which the approach not only improves performance in a single task, but accelerates transfer to new tasks. We modify the Arcade Learning Environment [Bellemare et al. in J Artif Intell Res 47:253–279, 2013] to support audio queries (ALE-audio code available at https://github.com/shayegano/Arcade-Learning-Environment ), and conduct evaluations of crossmodal learning in the Atari 2600 games H.E.R.O. and Amidar. Finally, building on the recent work of Babaeizadeh et al. [in: International conference on learning representations (ICLR), 2017], we open-source a fast hybrid CPU–GPU implementation of CASL (CASL code available at https://github.com/shayegano/CASL ).
ISSN:	1387-2532 1573-7454
DOI:	10.1007/s10458-019-09439-5