A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several deca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Science (American Association for the Advancement of Science) 2018-12, Vol.362 (6419), p.1140-1144
Hauptverfasser: Silver, David, Hubert, Thomas, Schrittwieser, Julian, Antonoglou, Ioannis, Lai, Matthew, Guez, Arthur, Lanctot, Marc, Sifre, Laurent, Kumaran, Dharshan, Graepel, Thore, Lillicrap, Timothy, Simonyan, Karen, Hassabis, Demis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1144
container_issue 6419
container_start_page 1140
container_title Science (American Association for the Advancement of Science)
container_volume 362
creator Silver, David
Hubert, Thomas
Schrittwieser, Julian
Antonoglou, Ioannis
Lai, Matthew
Guez, Arthur
Lanctot, Marc
Sifre, Laurent
Kumaran, Dharshan
Graepel, Thore
Lillicrap, Timothy
Simonyan, Karen
Hassabis, Demis
description The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
doi_str_mv 10.1126/science.aar6404
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2155927626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2155927626</sourcerecordid><originalsourceid>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</originalsourceid><addsrcrecordid>eNpd0D1PwzAQgGELgWgpzGzIEgtD09px7MRjVUFBqsQCK5FrXz6qxC52MvTfk6qBgekGPz6dXoTuKVlQGotl0DVYDQulvEhIcoGmlEgeyZiwSzQlhIkoIymfoJsQ9oQMb5JdowkjPGaUiCn6WuESLHjVYA-1LZzX0ILtcAPK29qWWDWl83VXtbirVIdbFTrwAesKQpjjULmynmNlDd64QXjXlxUO0BTRoVHHW3RVqCbA3Thn6PPl-WP9Gm3fN2_r1TbSTIguoppyw5VMBONUC5pKIkyis50kigttUppClhrNqZCc6AyMZKkyjKeS72Ih2Aw9nfcevPvuIXR5WwcNTaMsuD7kMeVcxqmIT_TxH9273tvhupNiIhNDsEEtz0p7F4KHIj_4ulX-mFOSn9LnY_p8TD_8eBj39rsWzJ__bc1-AE4WgGY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2153686075</pqid></control><display><type>article</type><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><source>American Association for the Advancement of Science</source><source>Jstor Complete Legacy</source><creator>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</creator><creatorcontrib>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</creatorcontrib><description>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</description><identifier>ISSN: 0036-8075</identifier><identifier>EISSN: 1095-9203</identifier><identifier>DOI: 10.1126/science.aar6404</identifier><identifier>PMID: 30523106</identifier><language>eng</language><publisher>United States: The American Association for the Advancement of Science</publisher><subject>Adaptation ; Algorithms ; Artificial intelligence ; Chess ; Computers ; Games ; Go/no-go discrimination learning ; Machine learning ; Mathematics ; Reinforcement ; State of the art</subject><ispartof>Science (American Association for the Advancement of Science), 2018-12, Vol.362 (6419), p.1140-1144</ispartof><rights>Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.</rights><rights>Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</citedby><cites>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</cites><orcidid>0000-0001-5995-5264 ; 0000-0003-3957-0310</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,2871,2872,27905,27906</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30523106$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hubert, Thomas</creatorcontrib><creatorcontrib>Schrittwieser, Julian</creatorcontrib><creatorcontrib>Antonoglou, Ioannis</creatorcontrib><creatorcontrib>Lai, Matthew</creatorcontrib><creatorcontrib>Guez, Arthur</creatorcontrib><creatorcontrib>Lanctot, Marc</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Kumaran, Dharshan</creatorcontrib><creatorcontrib>Graepel, Thore</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Simonyan, Karen</creatorcontrib><creatorcontrib>Hassabis, Demis</creatorcontrib><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><title>Science (American Association for the Advancement of Science)</title><addtitle>Science</addtitle><description>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</description><subject>Adaptation</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Chess</subject><subject>Computers</subject><subject>Games</subject><subject>Go/no-go discrimination learning</subject><subject>Machine learning</subject><subject>Mathematics</subject><subject>Reinforcement</subject><subject>State of the art</subject><issn>0036-8075</issn><issn>1095-9203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpd0D1PwzAQgGELgWgpzGzIEgtD09px7MRjVUFBqsQCK5FrXz6qxC52MvTfk6qBgekGPz6dXoTuKVlQGotl0DVYDQulvEhIcoGmlEgeyZiwSzQlhIkoIymfoJsQ9oQMb5JdowkjPGaUiCn6WuESLHjVYA-1LZzX0ILtcAPK29qWWDWl83VXtbirVIdbFTrwAesKQpjjULmynmNlDd64QXjXlxUO0BTRoVHHW3RVqCbA3Thn6PPl-WP9Gm3fN2_r1TbSTIguoppyw5VMBONUC5pKIkyis50kigttUppClhrNqZCc6AyMZKkyjKeS72Ih2Aw9nfcevPvuIXR5WwcNTaMsuD7kMeVcxqmIT_TxH9273tvhupNiIhNDsEEtz0p7F4KHIj_4ulX-mFOSn9LnY_p8TD_8eBj39rsWzJ__bc1-AE4WgGY</recordid><startdate>20181207</startdate><enddate>20181207</enddate><creator>Silver, David</creator><creator>Hubert, Thomas</creator><creator>Schrittwieser, Julian</creator><creator>Antonoglou, Ioannis</creator><creator>Lai, Matthew</creator><creator>Guez, Arthur</creator><creator>Lanctot, Marc</creator><creator>Sifre, Laurent</creator><creator>Kumaran, Dharshan</creator><creator>Graepel, Thore</creator><creator>Lillicrap, Timothy</creator><creator>Simonyan, Karen</creator><creator>Hassabis, Demis</creator><general>The American Association for the Advancement of Science</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SN</scope><scope>7SP</scope><scope>7SR</scope><scope>7SS</scope><scope>7T7</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7TM</scope><scope>7U5</scope><scope>7U9</scope><scope>8BQ</scope><scope>8FD</scope><scope>C1K</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5995-5264</orcidid><orcidid>https://orcid.org/0000-0003-3957-0310</orcidid></search><sort><creationdate>20181207</creationdate><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><author>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adaptation</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Chess</topic><topic>Computers</topic><topic>Games</topic><topic>Go/no-go discrimination learning</topic><topic>Machine learning</topic><topic>Mathematics</topic><topic>Reinforcement</topic><topic>State of the art</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hubert, Thomas</creatorcontrib><creatorcontrib>Schrittwieser, Julian</creatorcontrib><creatorcontrib>Antonoglou, Ioannis</creatorcontrib><creatorcontrib>Lai, Matthew</creatorcontrib><creatorcontrib>Guez, Arthur</creatorcontrib><creatorcontrib>Lanctot, Marc</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Kumaran, Dharshan</creatorcontrib><creatorcontrib>Graepel, Thore</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Simonyan, Karen</creatorcontrib><creatorcontrib>Hassabis, Demis</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Ecology Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Science (American Association for the Advancement of Science)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Silver, David</au><au>Hubert, Thomas</au><au>Schrittwieser, Julian</au><au>Antonoglou, Ioannis</au><au>Lai, Matthew</au><au>Guez, Arthur</au><au>Lanctot, Marc</au><au>Sifre, Laurent</au><au>Kumaran, Dharshan</au><au>Graepel, Thore</au><au>Lillicrap, Timothy</au><au>Simonyan, Karen</au><au>Hassabis, Demis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</atitle><jtitle>Science (American Association for the Advancement of Science)</jtitle><addtitle>Science</addtitle><date>2018-12-07</date><risdate>2018</risdate><volume>362</volume><issue>6419</issue><spage>1140</spage><epage>1144</epage><pages>1140-1144</pages><issn>0036-8075</issn><eissn>1095-9203</eissn><abstract>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</abstract><cop>United States</cop><pub>The American Association for the Advancement of Science</pub><pmid>30523106</pmid><doi>10.1126/science.aar6404</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0001-5995-5264</orcidid><orcidid>https://orcid.org/0000-0003-3957-0310</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0036-8075
ispartof Science (American Association for the Advancement of Science), 2018-12, Vol.362 (6419), p.1140-1144
issn 0036-8075
1095-9203
language eng
recordid cdi_proquest_miscellaneous_2155927626
source American Association for the Advancement of Science; Jstor Complete Legacy
subjects Adaptation
Algorithms
Artificial intelligence
Chess
Computers
Games
Go/no-go discrimination learning
Machine learning
Mathematics
Reinforcement
State of the art
title A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T13%3A58%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20general%20reinforcement%20learning%20algorithm%20that%20masters%20chess,%20shogi,%20and%20Go%20through%20self-play&rft.jtitle=Science%20(American%20Association%20for%20the%20Advancement%20of%20Science)&rft.au=Silver,%20David&rft.date=2018-12-07&rft.volume=362&rft.issue=6419&rft.spage=1140&rft.epage=1144&rft.pages=1140-1144&rft.issn=0036-8075&rft.eissn=1095-9203&rft_id=info:doi/10.1126/science.aar6404&rft_dat=%3Cproquest_cross%3E2155927626%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2153686075&rft_id=info:pmid/30523106&rfr_iscdi=true