Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simpl...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | DeepMind Interactive Agents Team Abramson, Josh Ahuja, Arun Brussee, Arthur Carnevale, Federico Cassin, Mary Fischer, Felix Georgiev, Petko Goldin, Alex Gupta, Mansi Harley, Tim Hill, Felix Humphreys, Peter C Hung, Alden Landon, Jessica Lillicrap, Timothy Merzic, Hamza Muldal, Alistair Santoro, Adam Scully, Guy von Glehn, Tamara Wayne, Greg Wong, Nathaniel Yan, Chen Zhu, Rui |
description | A common vision from science fiction is that robots will one day inhabit our
physical spaces, sense the world as we do, assist our physical labours, and
communicate with us through natural language. Here we study how to design
artificial agents that can interact naturally with humans using the
simplification of a virtual environment. We show that imitation learning of
human-human interactions in a simulated world, in conjunction with
self-supervised learning, is sufficient to produce a multimodal interactive
agent, which we call MIA, that successfully interacts with non-adversarial
humans 75% of the time. We further identify architectural and algorithmic
techniques that improve performance, such as hierarchical action selection.
Altogether, our results demonstrate that imitation of multi-modal, real-time
human behaviour may provide a straightforward and surprisingly effective means
of imbuing agents with a rich behavioural prior from which agents might then be
fine-tuned for specific purposes, thus laying a foundation for training capable
agents for interactive robots or digital assistants. A video of MIA's behaviour
may be found at https://youtu.be/ZFgRhviF7mY |
doi_str_mv | 10.48550/arxiv.2112.03763 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_03763</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_03763</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-92c01de3ac16e639ab42847e3a07188a58b4ceb4ba3f422dbe5250c7c7e82a803</originalsourceid><addsrcrecordid>eNotj8tqwzAUBbXpoiT9gK6qH7Crp6Usg-nD4FJosjfX8nUisJUgK27793XTrg4Mh4Eh5J6zXFmt2SPELz_ngnORM2kKeUs-yoiQfDjQt8uQ_HjqYKBVSBjBJT8j3R4wpIl--nSk1ejTcj4FCqGjOxz6bHc5Y5z9hB2tEWJYTGty08Mw4d3_rsj--Wlfvmb1-0tVbusMCiOzjXCMdyjB8QILuYFWCavMApjh1oK2rXLYqhZkr4ToWtRCM2ecQSvAMrkiD3_aa1Rzjn6E-N38xjXXOPkDO21Kvg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><source>arXiv.org</source><creator>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</creator><creatorcontrib>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</creatorcontrib><description>A common vision from science fiction is that robots will one day inhabit our
physical spaces, sense the world as we do, assist our physical labours, and
communicate with us through natural language. Here we study how to design
artificial agents that can interact naturally with humans using the
simplification of a virtual environment. We show that imitation learning of
human-human interactions in a simulated world, in conjunction with
self-supervised learning, is sufficient to produce a multimodal interactive
agent, which we call MIA, that successfully interacts with non-adversarial
humans 75% of the time. We further identify architectural and algorithmic
techniques that improve performance, such as hierarchical action selection.
Altogether, our results demonstrate that imitation of multi-modal, real-time
human behaviour may provide a straightforward and surprisingly effective means
of imbuing agents with a rich behavioural prior from which agents might then be
fine-tuned for specific purposes, thus laying a foundation for training capable
agents for interactive robots or digital assistants. A video of MIA's behaviour
may be found at https://youtu.be/ZFgRhviF7mY</description><identifier>DOI: 10.48550/arxiv.2112.03763</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2021-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.03763$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.03763$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>DeepMind Interactive Agents Team</creatorcontrib><creatorcontrib>Abramson, Josh</creatorcontrib><creatorcontrib>Ahuja, Arun</creatorcontrib><creatorcontrib>Brussee, Arthur</creatorcontrib><creatorcontrib>Carnevale, Federico</creatorcontrib><creatorcontrib>Cassin, Mary</creatorcontrib><creatorcontrib>Fischer, Felix</creatorcontrib><creatorcontrib>Georgiev, Petko</creatorcontrib><creatorcontrib>Goldin, Alex</creatorcontrib><creatorcontrib>Gupta, Mansi</creatorcontrib><creatorcontrib>Harley, Tim</creatorcontrib><creatorcontrib>Hill, Felix</creatorcontrib><creatorcontrib>Humphreys, Peter C</creatorcontrib><creatorcontrib>Hung, Alden</creatorcontrib><creatorcontrib>Landon, Jessica</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Merzic, Hamza</creatorcontrib><creatorcontrib>Muldal, Alistair</creatorcontrib><creatorcontrib>Santoro, Adam</creatorcontrib><creatorcontrib>Scully, Guy</creatorcontrib><creatorcontrib>von Glehn, Tamara</creatorcontrib><creatorcontrib>Wayne, Greg</creatorcontrib><creatorcontrib>Wong, Nathaniel</creatorcontrib><creatorcontrib>Yan, Chen</creatorcontrib><creatorcontrib>Zhu, Rui</creatorcontrib><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><description>A common vision from science fiction is that robots will one day inhabit our
physical spaces, sense the world as we do, assist our physical labours, and
communicate with us through natural language. Here we study how to design
artificial agents that can interact naturally with humans using the
simplification of a virtual environment. We show that imitation learning of
human-human interactions in a simulated world, in conjunction with
self-supervised learning, is sufficient to produce a multimodal interactive
agent, which we call MIA, that successfully interacts with non-adversarial
humans 75% of the time. We further identify architectural and algorithmic
techniques that improve performance, such as hierarchical action selection.
Altogether, our results demonstrate that imitation of multi-modal, real-time
human behaviour may provide a straightforward and surprisingly effective means
of imbuing agents with a rich behavioural prior from which agents might then be
fine-tuned for specific purposes, thus laying a foundation for training capable
agents for interactive robots or digital assistants. A video of MIA's behaviour
may be found at https://youtu.be/ZFgRhviF7mY</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAUBbXpoiT9gK6qH7Crp6Usg-nD4FJosjfX8nUisJUgK27793XTrg4Mh4Eh5J6zXFmt2SPELz_ngnORM2kKeUs-yoiQfDjQt8uQ_HjqYKBVSBjBJT8j3R4wpIl--nSk1ejTcj4FCqGjOxz6bHc5Y5z9hB2tEWJYTGty08Mw4d3_rsj--Wlfvmb1-0tVbusMCiOzjXCMdyjB8QILuYFWCavMApjh1oK2rXLYqhZkr4ToWtRCM2ecQSvAMrkiD3_aa1Rzjn6E-N38xjXXOPkDO21Kvg</recordid><startdate>20211207</startdate><enddate>20211207</enddate><creator>DeepMind Interactive Agents Team</creator><creator>Abramson, Josh</creator><creator>Ahuja, Arun</creator><creator>Brussee, Arthur</creator><creator>Carnevale, Federico</creator><creator>Cassin, Mary</creator><creator>Fischer, Felix</creator><creator>Georgiev, Petko</creator><creator>Goldin, Alex</creator><creator>Gupta, Mansi</creator><creator>Harley, Tim</creator><creator>Hill, Felix</creator><creator>Humphreys, Peter C</creator><creator>Hung, Alden</creator><creator>Landon, Jessica</creator><creator>Lillicrap, Timothy</creator><creator>Merzic, Hamza</creator><creator>Muldal, Alistair</creator><creator>Santoro, Adam</creator><creator>Scully, Guy</creator><creator>von Glehn, Tamara</creator><creator>Wayne, Greg</creator><creator>Wong, Nathaniel</creator><creator>Yan, Chen</creator><creator>Zhu, Rui</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211207</creationdate><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><author>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-92c01de3ac16e639ab42847e3a07188a58b4ceb4ba3f422dbe5250c7c7e82a803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>DeepMind Interactive Agents Team</creatorcontrib><creatorcontrib>Abramson, Josh</creatorcontrib><creatorcontrib>Ahuja, Arun</creatorcontrib><creatorcontrib>Brussee, Arthur</creatorcontrib><creatorcontrib>Carnevale, Federico</creatorcontrib><creatorcontrib>Cassin, Mary</creatorcontrib><creatorcontrib>Fischer, Felix</creatorcontrib><creatorcontrib>Georgiev, Petko</creatorcontrib><creatorcontrib>Goldin, Alex</creatorcontrib><creatorcontrib>Gupta, Mansi</creatorcontrib><creatorcontrib>Harley, Tim</creatorcontrib><creatorcontrib>Hill, Felix</creatorcontrib><creatorcontrib>Humphreys, Peter C</creatorcontrib><creatorcontrib>Hung, Alden</creatorcontrib><creatorcontrib>Landon, Jessica</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Merzic, Hamza</creatorcontrib><creatorcontrib>Muldal, Alistair</creatorcontrib><creatorcontrib>Santoro, Adam</creatorcontrib><creatorcontrib>Scully, Guy</creatorcontrib><creatorcontrib>von Glehn, Tamara</creatorcontrib><creatorcontrib>Wayne, Greg</creatorcontrib><creatorcontrib>Wong, Nathaniel</creatorcontrib><creatorcontrib>Yan, Chen</creatorcontrib><creatorcontrib>Zhu, Rui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>DeepMind Interactive Agents Team</au><au>Abramson, Josh</au><au>Ahuja, Arun</au><au>Brussee, Arthur</au><au>Carnevale, Federico</au><au>Cassin, Mary</au><au>Fischer, Felix</au><au>Georgiev, Petko</au><au>Goldin, Alex</au><au>Gupta, Mansi</au><au>Harley, Tim</au><au>Hill, Felix</au><au>Humphreys, Peter C</au><au>Hung, Alden</au><au>Landon, Jessica</au><au>Lillicrap, Timothy</au><au>Merzic, Hamza</au><au>Muldal, Alistair</au><au>Santoro, Adam</au><au>Scully, Guy</au><au>von Glehn, Tamara</au><au>Wayne, Greg</au><au>Wong, Nathaniel</au><au>Yan, Chen</au><au>Zhu, Rui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</atitle><date>2021-12-07</date><risdate>2021</risdate><abstract>A common vision from science fiction is that robots will one day inhabit our
physical spaces, sense the world as we do, assist our physical labours, and
communicate with us through natural language. Here we study how to design
artificial agents that can interact naturally with humans using the
simplification of a virtual environment. We show that imitation learning of
human-human interactions in a simulated world, in conjunction with
self-supervised learning, is sufficient to produce a multimodal interactive
agent, which we call MIA, that successfully interacts with non-adversarial
humans 75% of the time. We further identify architectural and algorithmic
techniques that improve performance, such as hierarchical action selection.
Altogether, our results demonstrate that imitation of multi-modal, real-time
human behaviour may provide a straightforward and surprisingly effective means
of imbuing agents with a rich behavioural prior from which agents might then be
fine-tuned for specific purposes, thus laying a foundation for training capable
agents for interactive robots or digital assistants. A video of MIA's behaviour
may be found at https://youtu.be/ZFgRhviF7mY</abstract><doi>10.48550/arxiv.2112.03763</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2112.03763 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2112_03763 |
source | arXiv.org |
subjects | Computer Science - Learning |
title | Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A05%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Creating%20Multimodal%20Interactive%20Agents%20with%20Imitation%20and%20Self-Supervised%20Learning&rft.au=DeepMind%20Interactive%20Agents%20Team&rft.date=2021-12-07&rft_id=info:doi/10.48550/arxiv.2112.03763&rft_dat=%3Carxiv_GOX%3E2112_03763%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |