Massively Parallel Methods for Deep Reinforcement Learning

We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Nair, Arun, Srinivasan, Praveen, Blackwell, Sam, Alcicek, Cagdas, Fearon, Rory, De Maria, Alessandro, Panneershelvam, Vedavyas, Suleyman, Mustafa, Beattie, Charles, Petersen, Stig, Legg, Shane, Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Nair, Arun
Srinivasan, Praveen
Blackwell, Sam
Alcicek, Cagdas
Fearon, Rory
De Maria, Alessandro
Panneershelvam, Vedavyas
Suleyman, Mustafa
Beattie, Charles
Petersen, Stig
Legg, Shane
Mnih, Volodymyr
Kavukcuoglu, Koray
Silver, David
description We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.
doi_str_mv 10.48550/arxiv.1507.04296
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1507_04296</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1507_04296</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</originalsourceid><addsrcrecordid>eNotj8tOwzAQRb3pAhU-gBX-gQS_xk7YVeUppWqFYB1N7DFYctPKqSr690BhdXQ3R_cwdi1FbRoAcYvlKx1rCcLVwqjWXrC7FU5TOlI-8Q0WzJkyX9HhcxcmHneF3xPt-Sul8Wd42tJ44B1hGdP4cclmEfNEV_-cs_fHh7flc9Wtn16Wi65CKcFWqjHKOy2slC0oj20zGBkGHa0UqAHAkBdBeAPOqUhDY7zH4KKENtrgjJ6zmz_v-X2_L2mL5dT_VvTnCv0NOfhBVQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Massively Parallel Methods for Deep Reinforcement Learning</title><source>arXiv.org</source><creator>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</creator><creatorcontrib>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</creatorcontrib><description>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</description><identifier>DOI: 10.48550/arxiv.1507.04296</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2015-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1507.04296$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1507.04296$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Nair, Arun</creatorcontrib><creatorcontrib>Srinivasan, Praveen</creatorcontrib><creatorcontrib>Blackwell, Sam</creatorcontrib><creatorcontrib>Alcicek, Cagdas</creatorcontrib><creatorcontrib>Fearon, Rory</creatorcontrib><creatorcontrib>De Maria, Alessandro</creatorcontrib><creatorcontrib>Panneershelvam, Vedavyas</creatorcontrib><creatorcontrib>Suleyman, Mustafa</creatorcontrib><creatorcontrib>Beattie, Charles</creatorcontrib><creatorcontrib>Petersen, Stig</creatorcontrib><creatorcontrib>Legg, Shane</creatorcontrib><creatorcontrib>Mnih, Volodymyr</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Silver, David</creatorcontrib><title>Massively Parallel Methods for Deep Reinforcement Learning</title><description>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQRb3pAhU-gBX-gQS_xk7YVeUppWqFYB1N7DFYctPKqSr690BhdXQ3R_cwdi1FbRoAcYvlKx1rCcLVwqjWXrC7FU5TOlI-8Q0WzJkyX9HhcxcmHneF3xPt-Sul8Wd42tJ44B1hGdP4cclmEfNEV_-cs_fHh7flc9Wtn16Wi65CKcFWqjHKOy2slC0oj20zGBkGHa0UqAHAkBdBeAPOqUhDY7zH4KKENtrgjJ6zmz_v-X2_L2mL5dT_VvTnCv0NOfhBVQ</recordid><startdate>20150715</startdate><enddate>20150715</enddate><creator>Nair, Arun</creator><creator>Srinivasan, Praveen</creator><creator>Blackwell, Sam</creator><creator>Alcicek, Cagdas</creator><creator>Fearon, Rory</creator><creator>De Maria, Alessandro</creator><creator>Panneershelvam, Vedavyas</creator><creator>Suleyman, Mustafa</creator><creator>Beattie, Charles</creator><creator>Petersen, Stig</creator><creator>Legg, Shane</creator><creator>Mnih, Volodymyr</creator><creator>Kavukcuoglu, Koray</creator><creator>Silver, David</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20150715</creationdate><title>Massively Parallel Methods for Deep Reinforcement Learning</title><author>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Nair, Arun</creatorcontrib><creatorcontrib>Srinivasan, Praveen</creatorcontrib><creatorcontrib>Blackwell, Sam</creatorcontrib><creatorcontrib>Alcicek, Cagdas</creatorcontrib><creatorcontrib>Fearon, Rory</creatorcontrib><creatorcontrib>De Maria, Alessandro</creatorcontrib><creatorcontrib>Panneershelvam, Vedavyas</creatorcontrib><creatorcontrib>Suleyman, Mustafa</creatorcontrib><creatorcontrib>Beattie, Charles</creatorcontrib><creatorcontrib>Petersen, Stig</creatorcontrib><creatorcontrib>Legg, Shane</creatorcontrib><creatorcontrib>Mnih, Volodymyr</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Silver, David</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nair, Arun</au><au>Srinivasan, Praveen</au><au>Blackwell, Sam</au><au>Alcicek, Cagdas</au><au>Fearon, Rory</au><au>De Maria, Alessandro</au><au>Panneershelvam, Vedavyas</au><au>Suleyman, Mustafa</au><au>Beattie, Charles</au><au>Petersen, Stig</au><au>Legg, Shane</au><au>Mnih, Volodymyr</au><au>Kavukcuoglu, Koray</au><au>Silver, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Massively Parallel Methods for Deep Reinforcement Learning</atitle><date>2015-07-15</date><risdate>2015</risdate><abstract>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</abstract><doi>10.48550/arxiv.1507.04296</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1507.04296
ispartof
issn
language eng
recordid cdi_arxiv_primary_1507_04296
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Learning
Computer Science - Neural and Evolutionary Computing
title Massively Parallel Methods for Deep Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-02T08%3A42%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Massively%20Parallel%20Methods%20for%20Deep%20Reinforcement%20Learning&rft.au=Nair,%20Arun&rft.date=2015-07-15&rft_id=info:doi/10.48550/arxiv.1507.04296&rft_dat=%3Carxiv_GOX%3E1507_04296%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true