A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several deca...
Gespeichert in:
Veröffentlicht in: | Science (American Association for the Advancement of Science) 2018-12, Vol.362 (6419), p.1140-1144 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1144 |
---|---|
container_issue | 6419 |
container_start_page | 1140 |
container_title | Science (American Association for the Advancement of Science) |
container_volume | 362 |
creator | Silver, David Hubert, Thomas Schrittwieser, Julian Antonoglou, Ioannis Lai, Matthew Guez, Arthur Lanctot, Marc Sifre, Laurent Kumaran, Dharshan Graepel, Thore Lillicrap, Timothy Simonyan, Karen Hassabis, Demis |
description | The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go. |
doi_str_mv | 10.1126/science.aar6404 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2155927626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2155927626</sourcerecordid><originalsourceid>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</originalsourceid><addsrcrecordid>eNpd0D1PwzAQgGELgWgpzGzIEgtD09px7MRjVUFBqsQCK5FrXz6qxC52MvTfk6qBgekGPz6dXoTuKVlQGotl0DVYDQulvEhIcoGmlEgeyZiwSzQlhIkoIymfoJsQ9oQMb5JdowkjPGaUiCn6WuESLHjVYA-1LZzX0ILtcAPK29qWWDWl83VXtbirVIdbFTrwAesKQpjjULmynmNlDd64QXjXlxUO0BTRoVHHW3RVqCbA3Thn6PPl-WP9Gm3fN2_r1TbSTIguoppyw5VMBONUC5pKIkyis50kigttUppClhrNqZCc6AyMZKkyjKeS72Ih2Aw9nfcevPvuIXR5WwcNTaMsuD7kMeVcxqmIT_TxH9273tvhupNiIhNDsEEtz0p7F4KHIj_4ulX-mFOSn9LnY_p8TD_8eBj39rsWzJ__bc1-AE4WgGY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2153686075</pqid></control><display><type>article</type><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><source>American Association for the Advancement of Science</source><source>Jstor Complete Legacy</source><creator>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</creator><creatorcontrib>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</creatorcontrib><description>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</description><identifier>ISSN: 0036-8075</identifier><identifier>EISSN: 1095-9203</identifier><identifier>DOI: 10.1126/science.aar6404</identifier><identifier>PMID: 30523106</identifier><language>eng</language><publisher>United States: The American Association for the Advancement of Science</publisher><subject>Adaptation ; Algorithms ; Artificial intelligence ; Chess ; Computers ; Games ; Go/no-go discrimination learning ; Machine learning ; Mathematics ; Reinforcement ; State of the art</subject><ispartof>Science (American Association for the Advancement of Science), 2018-12, Vol.362 (6419), p.1140-1144</ispartof><rights>Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.</rights><rights>Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</citedby><cites>FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</cites><orcidid>0000-0001-5995-5264 ; 0000-0003-3957-0310</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,2871,2872,27905,27906</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30523106$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hubert, Thomas</creatorcontrib><creatorcontrib>Schrittwieser, Julian</creatorcontrib><creatorcontrib>Antonoglou, Ioannis</creatorcontrib><creatorcontrib>Lai, Matthew</creatorcontrib><creatorcontrib>Guez, Arthur</creatorcontrib><creatorcontrib>Lanctot, Marc</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Kumaran, Dharshan</creatorcontrib><creatorcontrib>Graepel, Thore</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Simonyan, Karen</creatorcontrib><creatorcontrib>Hassabis, Demis</creatorcontrib><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><title>Science (American Association for the Advancement of Science)</title><addtitle>Science</addtitle><description>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</description><subject>Adaptation</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Chess</subject><subject>Computers</subject><subject>Games</subject><subject>Go/no-go discrimination learning</subject><subject>Machine learning</subject><subject>Mathematics</subject><subject>Reinforcement</subject><subject>State of the art</subject><issn>0036-8075</issn><issn>1095-9203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpd0D1PwzAQgGELgWgpzGzIEgtD09px7MRjVUFBqsQCK5FrXz6qxC52MvTfk6qBgekGPz6dXoTuKVlQGotl0DVYDQulvEhIcoGmlEgeyZiwSzQlhIkoIymfoJsQ9oQMb5JdowkjPGaUiCn6WuESLHjVYA-1LZzX0ILtcAPK29qWWDWl83VXtbirVIdbFTrwAesKQpjjULmynmNlDd64QXjXlxUO0BTRoVHHW3RVqCbA3Thn6PPl-WP9Gm3fN2_r1TbSTIguoppyw5VMBONUC5pKIkyis50kigttUppClhrNqZCc6AyMZKkyjKeS72Ih2Aw9nfcevPvuIXR5WwcNTaMsuD7kMeVcxqmIT_TxH9273tvhupNiIhNDsEEtz0p7F4KHIj_4ulX-mFOSn9LnY_p8TD_8eBj39rsWzJ__bc1-AE4WgGY</recordid><startdate>20181207</startdate><enddate>20181207</enddate><creator>Silver, David</creator><creator>Hubert, Thomas</creator><creator>Schrittwieser, Julian</creator><creator>Antonoglou, Ioannis</creator><creator>Lai, Matthew</creator><creator>Guez, Arthur</creator><creator>Lanctot, Marc</creator><creator>Sifre, Laurent</creator><creator>Kumaran, Dharshan</creator><creator>Graepel, Thore</creator><creator>Lillicrap, Timothy</creator><creator>Simonyan, Karen</creator><creator>Hassabis, Demis</creator><general>The American Association for the Advancement of Science</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SN</scope><scope>7SP</scope><scope>7SR</scope><scope>7SS</scope><scope>7T7</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7TM</scope><scope>7U5</scope><scope>7U9</scope><scope>8BQ</scope><scope>8FD</scope><scope>C1K</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5995-5264</orcidid><orcidid>https://orcid.org/0000-0003-3957-0310</orcidid></search><sort><creationdate>20181207</creationdate><title>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</title><author>Silver, David ; Hubert, Thomas ; Schrittwieser, Julian ; Antonoglou, Ioannis ; Lai, Matthew ; Guez, Arthur ; Lanctot, Marc ; Sifre, Laurent ; Kumaran, Dharshan ; Graepel, Thore ; Lillicrap, Timothy ; Simonyan, Karen ; Hassabis, Demis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c366t-1c15d5a946351c617906d4c8b90a56cd717e87dc516950c8ed937ad35795b2663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adaptation</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Chess</topic><topic>Computers</topic><topic>Games</topic><topic>Go/no-go discrimination learning</topic><topic>Machine learning</topic><topic>Mathematics</topic><topic>Reinforcement</topic><topic>State of the art</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hubert, Thomas</creatorcontrib><creatorcontrib>Schrittwieser, Julian</creatorcontrib><creatorcontrib>Antonoglou, Ioannis</creatorcontrib><creatorcontrib>Lai, Matthew</creatorcontrib><creatorcontrib>Guez, Arthur</creatorcontrib><creatorcontrib>Lanctot, Marc</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Kumaran, Dharshan</creatorcontrib><creatorcontrib>Graepel, Thore</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Simonyan, Karen</creatorcontrib><creatorcontrib>Hassabis, Demis</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Ecology Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Science (American Association for the Advancement of Science)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Silver, David</au><au>Hubert, Thomas</au><au>Schrittwieser, Julian</au><au>Antonoglou, Ioannis</au><au>Lai, Matthew</au><au>Guez, Arthur</au><au>Lanctot, Marc</au><au>Sifre, Laurent</au><au>Kumaran, Dharshan</au><au>Graepel, Thore</au><au>Lillicrap, Timothy</au><au>Simonyan, Karen</au><au>Hassabis, Demis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play</atitle><jtitle>Science (American Association for the Advancement of Science)</jtitle><addtitle>Science</addtitle><date>2018-12-07</date><risdate>2018</risdate><volume>362</volume><issue>6419</issue><spage>1140</spage><epage>1144</epage><pages>1140-1144</pages><issn>0036-8075</issn><eissn>1095-9203</eissn><abstract>The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.</abstract><cop>United States</cop><pub>The American Association for the Advancement of Science</pub><pmid>30523106</pmid><doi>10.1126/science.aar6404</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0001-5995-5264</orcidid><orcidid>https://orcid.org/0000-0003-3957-0310</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0036-8075 |
ispartof | Science (American Association for the Advancement of Science), 2018-12, Vol.362 (6419), p.1140-1144 |
issn | 0036-8075 1095-9203 |
language | eng |
recordid | cdi_proquest_miscellaneous_2155927626 |
source | American Association for the Advancement of Science; Jstor Complete Legacy |
subjects | Adaptation Algorithms Artificial intelligence Chess Computers Games Go/no-go discrimination learning Machine learning Mathematics Reinforcement State of the art |
title | A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T13%3A58%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20general%20reinforcement%20learning%20algorithm%20that%20masters%20chess,%20shogi,%20and%20Go%20through%20self-play&rft.jtitle=Science%20(American%20Association%20for%20the%20Advancement%20of%20Science)&rft.au=Silver,%20David&rft.date=2018-12-07&rft.volume=362&rft.issue=6419&rft.spage=1140&rft.epage=1144&rft.pages=1140-1144&rft.issn=0036-8075&rft.eissn=1095-9203&rft_id=info:doi/10.1126/science.aar6404&rft_dat=%3Cproquest_cross%3E2155927626%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2153686075&rft_id=info:pmid/30523106&rfr_iscdi=true |