Learning a Behavioral Repertoire from Demonstrations

Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Justesen, Niels, Duque, Miguel Gonzalez, Jaramillo, Daniel Cabarcas, Mouret, Jean-Baptiste, Risi, Sebastian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Justesen, Niels
Duque, Miguel Gonzalez
Jaramillo, Daniel Cabarcas
Mouret, Jean-Baptiste
Risi, Sebastian
description Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.
doi_str_mv 10.48550/arxiv.1907.03046
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1907_03046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1907_03046</sourcerecordid><originalsourceid>FETCH-arxiv_primary_1907_030463</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning a Behavioral Repertoire from Demonstrations</title><source>arXiv.org</source><creator>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creator><creatorcontrib>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creatorcontrib><description>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</description><identifier>DOI: 10.48550/arxiv.1907.03046</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2019-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1907.03046$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1907.03046$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><title>Learning a Behavioral Repertoire from Demonstrations</title><description>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</recordid><startdate>20190705</startdate><enddate>20190705</enddate><creator>Justesen, Niels</creator><creator>Duque, Miguel Gonzalez</creator><creator>Jaramillo, Daniel Cabarcas</creator><creator>Mouret, Jean-Baptiste</creator><creator>Risi, Sebastian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190705</creationdate><title>Learning a Behavioral Repertoire from Demonstrations</title><author>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_1907_030463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Justesen, Niels</au><au>Duque, Miguel Gonzalez</au><au>Jaramillo, Daniel Cabarcas</au><au>Mouret, Jean-Baptiste</au><au>Risi, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning a Behavioral Repertoire from Demonstrations</atitle><date>2019-07-05</date><risdate>2019</risdate><abstract>Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.</abstract><doi>10.48550/arxiv.1907.03046</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1907.03046
ispartof
issn
language eng
recordid cdi_arxiv_primary_1907_03046
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title Learning a Behavioral Repertoire from Demonstrations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A24%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20a%20Behavioral%20Repertoire%20from%20Demonstrations&rft.au=Justesen,%20Niels&rft.date=2019-07-05&rft_id=info:doi/10.48550/arxiv.1907.03046&rft_dat=%3Carxiv_GOX%3E1907_03046%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true