Mastering Atari, Go, chess and shogi by planning with a learned model

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess 1 and Go 2 , where a perfect simulator is available. However, in real-world p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature (London) 2020-12, Vol.588 (7839), p.604-609
Hauptverfasser:	Schrittwieser, Julian, Antonoglou, Ioannis, Hubert, Thomas, Simonyan, Karen, Sifre, Laurent, Schmitt, Simon, Guez, Arthur, Lockhart, Edward, Hassabis, Demis, Graepel, Thore, Lillicrap, Timothy, Silver, David
Format:	Artikel
Sprache:	eng
Schlagworte:	639/705/1042 639/705/117 Agents (artificial intelligence) Algorithms Artificial intelligence Binary searching Chess Computer & video games Domains Expected values Games Humanities and Social Sciences Learning Mastering multidisciplinary Neural networks Performance evaluation Reinforcement Reinforcement learning (Machine learning) Science Science (multidisciplinary) State-of-the-art reviews Technology application
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess 1 and Go 2 , where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games 3 —the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled 4 —the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm 5 that was supplied with the rules of the game. A reinforcement-learning algorithm that combines a tree-based search with a learned model achieves superhuman performance in high-performance planning and visually complex domains, without any knowledge of their underlying dynamics.
ISSN:	0028-0836 1476-4687
DOI:	10.1038/s41586-020-03051-4