Learning a Behavioral Repertoire from Demonstrations
Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Justesen, Niels Duque, Miguel Gonzalez Jaramillo, Daniel Cabarcas Mouret, Jean-Baptiste Risi, Sebastian |
description | Imitation Learning (IL) is a machine learning approach to learn a policy from
a dataset of demonstrations. IL can be useful to kick-start learning before
applying reinforcement learning (RL) but it can also be useful on its own, e.g.
to learn to imitate human players in video games. However, a major limitation
of current IL approaches is that they learn only a single "average" policy
based on a dataset that possibly contains demonstrations of numerous different
types of behaviors. In this paper, we propose a new approach called Behavioral
Repertoire Imitation Learning (BRIL) that instead learns a repertoire of
behaviors from a set of demonstrations by augmenting the state-action pairs
with behavioral descriptions. The outcome of this approach is a single neural
network policy conditioned on a behavior description that can be precisely
modulated. We apply this approach to train a policy on 7,777 human replays to
perform build-order planning in StarCraft II. Principal Component Analysis
(PCA) is applied to construct a low-dimensional behavioral space from the
high-dimensional army unit composition of each demonstration. The results
demonstrate that the learned policy can be effectively manipulated to express
distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able
to adapt the behavior of the policy - in-between games - to reach a performance
beyond that of the traditional IL baseline approach. |
doi_str_mv | 10.48550/arxiv.1907.03046 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1907_03046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1907_03046</sourcerecordid><originalsourceid>FETCH-arxiv_primary_1907_030463</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning a Behavioral Repertoire from Demonstrations</title><source>arXiv.org</source><creator>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creator><creatorcontrib>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</creatorcontrib><description>Imitation Learning (IL) is a machine learning approach to learn a policy from
a dataset of demonstrations. IL can be useful to kick-start learning before
applying reinforcement learning (RL) but it can also be useful on its own, e.g.
to learn to imitate human players in video games. However, a major limitation
of current IL approaches is that they learn only a single "average" policy
based on a dataset that possibly contains demonstrations of numerous different
types of behaviors. In this paper, we propose a new approach called Behavioral
Repertoire Imitation Learning (BRIL) that instead learns a repertoire of
behaviors from a set of demonstrations by augmenting the state-action pairs
with behavioral descriptions. The outcome of this approach is a single neural
network policy conditioned on a behavior description that can be precisely
modulated. We apply this approach to train a policy on 7,777 human replays to
perform build-order planning in StarCraft II. Principal Component Analysis
(PCA) is applied to construct a low-dimensional behavioral space from the
high-dimensional army unit composition of each demonstration. The results
demonstrate that the learned policy can be effectively manipulated to express
distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able
to adapt the behavior of the policy - in-between games - to reach a performance
beyond that of the traditional IL baseline approach.</description><identifier>DOI: 10.48550/arxiv.1907.03046</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2019-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1907.03046$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1907.03046$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><title>Learning a Behavioral Repertoire from Demonstrations</title><description>Imitation Learning (IL) is a machine learning approach to learn a policy from
a dataset of demonstrations. IL can be useful to kick-start learning before
applying reinforcement learning (RL) but it can also be useful on its own, e.g.
to learn to imitate human players in video games. However, a major limitation
of current IL approaches is that they learn only a single "average" policy
based on a dataset that possibly contains demonstrations of numerous different
types of behaviors. In this paper, we propose a new approach called Behavioral
Repertoire Imitation Learning (BRIL) that instead learns a repertoire of
behaviors from a set of demonstrations by augmenting the state-action pairs
with behavioral descriptions. The outcome of this approach is a single neural
network policy conditioned on a behavior description that can be precisely
modulated. We apply this approach to train a policy on 7,777 human replays to
perform build-order planning in StarCraft II. Principal Component Analysis
(PCA) is applied to construct a low-dimensional behavioral space from the
high-dimensional army unit composition of each demonstration. The results
demonstrate that the learned policy can be effectively manipulated to express
distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able
to adapt the behavior of the policy - in-between games - to reach a performance
beyond that of the traditional IL baseline approach.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMztDQw1zMwNjAx42Qw8UlNLMrLzEtXSFRwSs1ILMvML0rMUQhKLUgtKsnPLEpVSCvKz1VwSc3NzysuKUosyQTSPAysaYk5xam8UJqbQd7NNcTZQxdsfnxBUWZuYlFlPMieeLA9xoRVAABGUTKf</recordid><startdate>20190705</startdate><enddate>20190705</enddate><creator>Justesen, Niels</creator><creator>Duque, Miguel Gonzalez</creator><creator>Jaramillo, Daniel Cabarcas</creator><creator>Mouret, Jean-Baptiste</creator><creator>Risi, Sebastian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190705</creationdate><title>Learning a Behavioral Repertoire from Demonstrations</title><author>Justesen, Niels ; Duque, Miguel Gonzalez ; Jaramillo, Daniel Cabarcas ; Mouret, Jean-Baptiste ; Risi, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_1907_030463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Justesen, Niels</creatorcontrib><creatorcontrib>Duque, Miguel Gonzalez</creatorcontrib><creatorcontrib>Jaramillo, Daniel Cabarcas</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><creatorcontrib>Risi, Sebastian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Justesen, Niels</au><au>Duque, Miguel Gonzalez</au><au>Jaramillo, Daniel Cabarcas</au><au>Mouret, Jean-Baptiste</au><au>Risi, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning a Behavioral Repertoire from Demonstrations</atitle><date>2019-07-05</date><risdate>2019</risdate><abstract>Imitation Learning (IL) is a machine learning approach to learn a policy from
a dataset of demonstrations. IL can be useful to kick-start learning before
applying reinforcement learning (RL) but it can also be useful on its own, e.g.
to learn to imitate human players in video games. However, a major limitation
of current IL approaches is that they learn only a single "average" policy
based on a dataset that possibly contains demonstrations of numerous different
types of behaviors. In this paper, we propose a new approach called Behavioral
Repertoire Imitation Learning (BRIL) that instead learns a repertoire of
behaviors from a set of demonstrations by augmenting the state-action pairs
with behavioral descriptions. The outcome of this approach is a single neural
network policy conditioned on a behavior description that can be precisely
modulated. We apply this approach to train a policy on 7,777 human replays to
perform build-order planning in StarCraft II. Principal Component Analysis
(PCA) is applied to construct a low-dimensional behavioral space from the
high-dimensional army unit composition of each demonstration. The results
demonstrate that the learned policy can be effectively manipulated to express
distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able
to adapt the behavior of the policy - in-between games - to reach a performance
beyond that of the traditional IL baseline approach.</abstract><doi>10.48550/arxiv.1907.03046</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1907.03046 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1907_03046 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning |
title | Learning a Behavioral Repertoire from Demonstrations |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A24%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20a%20Behavioral%20Repertoire%20from%20Demonstrations&rft.au=Justesen,%20Niels&rft.date=2019-07-05&rft_id=info:doi/10.48550/arxiv.1907.03046&rft_dat=%3Carxiv_GOX%3E1907_03046%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |