Massively Parallel Methods for Deep Reinforcement Learning

We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nair, Arun, Srinivasan, Praveen, Blackwell, Sam, Alcicek, Cagdas, Fearon, Rory, De Maria, Alessandro, Panneershelvam, Vedavyas, Suleyman, Mustafa, Beattie, Charles, Petersen, Stig, Legg, Shane, Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning Computer Science - Neural and Evolutionary Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Nair, Arun Srinivasan, Praveen Blackwell, Sam Alcicek, Cagdas Fearon, Rory De Maria, Alessandro Panneershelvam, Vedavyas Suleyman, Mustafa Beattie, Charles Petersen, Stig Legg, Shane Mnih, Volodymyr Kavukcuoglu, Koray Silver, David
description	We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.
doi_str_mv	10.48550/arxiv.1507.04296
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1507_04296</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1507_04296</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</originalsourceid><addsrcrecordid>eNotj8tOwzAQRb3pAhU-gBX-gQS_xk7YVeUppWqFYB1N7DFYctPKqSr690BhdXQ3R_cwdi1FbRoAcYvlKx1rCcLVwqjWXrC7FU5TOlI-8Q0WzJkyX9HhcxcmHneF3xPt-Sul8Wd42tJ44B1hGdP4cclmEfNEV_-cs_fHh7flc9Wtn16Wi65CKcFWqjHKOy2slC0oj20zGBkGHa0UqAHAkBdBeAPOqUhDY7zH4KKENtrgjJ6zmz_v-X2_L2mL5dT_VvTnCv0NOfhBVQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Massively Parallel Methods for Deep Reinforcement Learning</title><source>arXiv.org</source><creator>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</creator><creatorcontrib>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</creatorcontrib><description>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</description><identifier>DOI: 10.48550/arxiv.1507.04296</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2015-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1507.04296$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1507.04296$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Nair, Arun</creatorcontrib><creatorcontrib>Srinivasan, Praveen</creatorcontrib><creatorcontrib>Blackwell, Sam</creatorcontrib><creatorcontrib>Alcicek, Cagdas</creatorcontrib><creatorcontrib>Fearon, Rory</creatorcontrib><creatorcontrib>De Maria, Alessandro</creatorcontrib><creatorcontrib>Panneershelvam, Vedavyas</creatorcontrib><creatorcontrib>Suleyman, Mustafa</creatorcontrib><creatorcontrib>Beattie, Charles</creatorcontrib><creatorcontrib>Petersen, Stig</creatorcontrib><creatorcontrib>Legg, Shane</creatorcontrib><creatorcontrib>Mnih, Volodymyr</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Silver, David</creatorcontrib><title>Massively Parallel Methods for Deep Reinforcement Learning</title><description>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQRb3pAhU-gBX-gQS_xk7YVeUppWqFYB1N7DFYctPKqSr690BhdXQ3R_cwdi1FbRoAcYvlKx1rCcLVwqjWXrC7FU5TOlI-8Q0WzJkyX9HhcxcmHneF3xPt-Sul8Wd42tJ44B1hGdP4cclmEfNEV_-cs_fHh7flc9Wtn16Wi65CKcFWqjHKOy2slC0oj20zGBkGHa0UqAHAkBdBeAPOqUhDY7zH4KKENtrgjJ6zmz_v-X2_L2mL5dT_VvTnCv0NOfhBVQ</recordid><startdate>20150715</startdate><enddate>20150715</enddate><creator>Nair, Arun</creator><creator>Srinivasan, Praveen</creator><creator>Blackwell, Sam</creator><creator>Alcicek, Cagdas</creator><creator>Fearon, Rory</creator><creator>De Maria, Alessandro</creator><creator>Panneershelvam, Vedavyas</creator><creator>Suleyman, Mustafa</creator><creator>Beattie, Charles</creator><creator>Petersen, Stig</creator><creator>Legg, Shane</creator><creator>Mnih, Volodymyr</creator><creator>Kavukcuoglu, Koray</creator><creator>Silver, David</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20150715</creationdate><title>Massively Parallel Methods for Deep Reinforcement Learning</title><author>Nair, Arun ; Srinivasan, Praveen ; Blackwell, Sam ; Alcicek, Cagdas ; Fearon, Rory ; De Maria, Alessandro ; Panneershelvam, Vedavyas ; Suleyman, Mustafa ; Beattie, Charles ; Petersen, Stig ; Legg, Shane ; Mnih, Volodymyr ; Kavukcuoglu, Koray ; Silver, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1156-2842c730611952ca98b41db3f610a35554ec0d0c45772feb84ccad7f159f6d743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Nair, Arun</creatorcontrib><creatorcontrib>Srinivasan, Praveen</creatorcontrib><creatorcontrib>Blackwell, Sam</creatorcontrib><creatorcontrib>Alcicek, Cagdas</creatorcontrib><creatorcontrib>Fearon, Rory</creatorcontrib><creatorcontrib>De Maria, Alessandro</creatorcontrib><creatorcontrib>Panneershelvam, Vedavyas</creatorcontrib><creatorcontrib>Suleyman, Mustafa</creatorcontrib><creatorcontrib>Beattie, Charles</creatorcontrib><creatorcontrib>Petersen, Stig</creatorcontrib><creatorcontrib>Legg, Shane</creatorcontrib><creatorcontrib>Mnih, Volodymyr</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Silver, David</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nair, Arun</au><au>Srinivasan, Praveen</au><au>Blackwell, Sam</au><au>Alcicek, Cagdas</au><au>Fearon, Rory</au><au>De Maria, Alessandro</au><au>Panneershelvam, Vedavyas</au><au>Suleyman, Mustafa</au><au>Beattie, Charles</au><au>Petersen, Stig</au><au>Legg, Shane</au><au>Mnih, Volodymyr</au><au>Kavukcuoglu, Koray</au><au>Silver, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Massively Parallel Methods for Deep Reinforcement Learning</atitle><date>2015-07-15</date><risdate>2015</risdate><abstract>We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.</abstract><doi>10.48550/arxiv.1507.04296</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1507.04296
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1507_04296
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning Computer Science - Neural and Evolutionary Computing
title	Massively Parallel Methods for Deep Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-02T08%3A42%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Massively%20Parallel%20Methods%20for%20Deep%20Reinforcement%20Learning&rft.au=Nair,%20Arun&rft.date=2015-07-15&rft_id=info:doi/10.48550/arxiv.1507.04296&rft_dat=%3Carxiv_GOX%3E1507_04296%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true