Learning a Behavioral Repertoire from Demonstrations

Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Justesen, Niels, Duque, Miguel Gonzalez, Jaramillo, Daniel Cabarcas, Mouret, Jean-Baptiste, Risi, Sebastian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Justesen, Niels Duque, Miguel Gonzalez Jaramillo, Daniel Cabarcas Mouret, Jean-Baptiste Risi, Sebastian
description	Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.
doi_str_mv	10.48550/arxiv.1907.03046
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1907_03046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1907_03046</sourcerecordid><originalsourceid>FETCH-arxiv_primary_1907_030463</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning a Behavioral Repertoire from Demonstrations</title><source>arXiv.org</source><creator>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creator><creatorcontrib>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creatorcontrib><description>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</description><identifier>DOI: 10.48550/arxiv.1907.03046</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2019-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1907.03046$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1907.03046$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><title>Learning a Behavioral Repertoire from Demonstrations</title><description>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</recordid><startdate>20190705</startdate><enddate>20190705</enddate><creator>Justesen, Niels</creator><creator>Duque, Miguel Gonzalez</creator><creator>Jaramillo, Daniel Cabarcas</creator><creator>Mouret, Jean-Baptiste</creator><creator>Risi, Sebastian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190705</creationdate><title>Learning a Behavioral Repertoire from Demonstrations</title><author>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_1907_030463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Justesen, Niels</au><au>Duque, Miguel Gonzalez</au><au>Jaramillo, Daniel Cabarcas</au><au>Mouret, Jean-Baptiste</au><au>Risi, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning a Behavioral Repertoire from Demonstrations</atitle><date>2019-07-05</date><risdate>2019</risdate><abstract>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</abstract><doi>10.48550/arxiv.1907.03046</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1907.03046
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1907_03046
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Learning a Behavioral Repertoire from Demonstrations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A24%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20a%20Behavioral%20Repertoire%20from%20Demonstrations&rft.au=Justesen,%20Niels&rft.date=2019-07-05&rft_id=info:doi/10.48550/arxiv.1907.03046&rft_dat=%3Carxiv_GOX%3E1907_03046%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true