Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simpl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DeepMind Interactive Agents Team, Abramson, Josh, Ahuja, Arun, Brussee, Arthur, Carnevale, Federico, Cassin, Mary, Fischer, Felix, Georgiev, Petko, Goldin, Alex, Gupta, Mansi, Harley, Tim, Hill, Felix, Humphreys, Peter C, Hung, Alden, Landon, Jessica, Lillicrap, Timothy, Merzic, Hamza, Muldal, Alistair, Santoro, Adam, Scully, Guy, von Glehn, Tamara, Wayne, Greg, Wong, Nathaniel, Yan, Chen, Zhu, Rui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator DeepMind Interactive Agents Team
Abramson, Josh
Ahuja, Arun
Brussee, Arthur
Carnevale, Federico
Cassin, Mary
Fischer, Felix
Georgiev, Petko
Goldin, Alex
Gupta, Mansi
Harley, Tim
Hill, Felix
Humphreys, Peter C
Hung, Alden
Landon, Jessica
Lillicrap, Timothy
Merzic, Hamza
Muldal, Alistair
Santoro, Adam
Scully, Guy
von Glehn, Tamara
Wayne, Greg
Wong, Nathaniel
Yan, Chen
Zhu, Rui
description A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY
doi_str_mv 10.48550/arxiv.2112.03763
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_03763</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_03763</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-92c01de3ac16e639ab42847e3a07188a58b4ceb4ba3f422dbe5250c7c7e82a803</originalsourceid><addsrcrecordid>eNotj8tqwzAUBbXpoiT9gK6qH7Crp6Usg-nD4FJosjfX8nUisJUgK27793XTrg4Mh4Eh5J6zXFmt2SPELz_ngnORM2kKeUs-yoiQfDjQt8uQ_HjqYKBVSBjBJT8j3R4wpIl--nSk1ejTcj4FCqGjOxz6bHc5Y5z9hB2tEWJYTGty08Mw4d3_rsj--Wlfvmb1-0tVbusMCiOzjXCMdyjB8QILuYFWCavMApjh1oK2rXLYqhZkr4ToWtRCM2ecQSvAMrkiD3_aa1Rzjn6E-N38xjXXOPkDO21Kvg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><source>arXiv.org</source><creator>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</creator><creatorcontrib>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</creatorcontrib><description>A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY</description><identifier>DOI: 10.48550/arxiv.2112.03763</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2021-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.03763$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.03763$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>DeepMind Interactive Agents Team</creatorcontrib><creatorcontrib>Abramson, Josh</creatorcontrib><creatorcontrib>Ahuja, Arun</creatorcontrib><creatorcontrib>Brussee, Arthur</creatorcontrib><creatorcontrib>Carnevale, Federico</creatorcontrib><creatorcontrib>Cassin, Mary</creatorcontrib><creatorcontrib>Fischer, Felix</creatorcontrib><creatorcontrib>Georgiev, Petko</creatorcontrib><creatorcontrib>Goldin, Alex</creatorcontrib><creatorcontrib>Gupta, Mansi</creatorcontrib><creatorcontrib>Harley, Tim</creatorcontrib><creatorcontrib>Hill, Felix</creatorcontrib><creatorcontrib>Humphreys, Peter C</creatorcontrib><creatorcontrib>Hung, Alden</creatorcontrib><creatorcontrib>Landon, Jessica</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Merzic, Hamza</creatorcontrib><creatorcontrib>Muldal, Alistair</creatorcontrib><creatorcontrib>Santoro, Adam</creatorcontrib><creatorcontrib>Scully, Guy</creatorcontrib><creatorcontrib>von Glehn, Tamara</creatorcontrib><creatorcontrib>Wayne, Greg</creatorcontrib><creatorcontrib>Wong, Nathaniel</creatorcontrib><creatorcontrib>Yan, Chen</creatorcontrib><creatorcontrib>Zhu, Rui</creatorcontrib><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><description>A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAUBbXpoiT9gK6qH7Crp6Usg-nD4FJosjfX8nUisJUgK27793XTrg4Mh4Eh5J6zXFmt2SPELz_ngnORM2kKeUs-yoiQfDjQt8uQ_HjqYKBVSBjBJT8j3R4wpIl--nSk1ejTcj4FCqGjOxz6bHc5Y5z9hB2tEWJYTGty08Mw4d3_rsj--Wlfvmb1-0tVbusMCiOzjXCMdyjB8QILuYFWCavMApjh1oK2rXLYqhZkr4ToWtRCM2ecQSvAMrkiD3_aa1Rzjn6E-N38xjXXOPkDO21Kvg</recordid><startdate>20211207</startdate><enddate>20211207</enddate><creator>DeepMind Interactive Agents Team</creator><creator>Abramson, Josh</creator><creator>Ahuja, Arun</creator><creator>Brussee, Arthur</creator><creator>Carnevale, Federico</creator><creator>Cassin, Mary</creator><creator>Fischer, Felix</creator><creator>Georgiev, Petko</creator><creator>Goldin, Alex</creator><creator>Gupta, Mansi</creator><creator>Harley, Tim</creator><creator>Hill, Felix</creator><creator>Humphreys, Peter C</creator><creator>Hung, Alden</creator><creator>Landon, Jessica</creator><creator>Lillicrap, Timothy</creator><creator>Merzic, Hamza</creator><creator>Muldal, Alistair</creator><creator>Santoro, Adam</creator><creator>Scully, Guy</creator><creator>von Glehn, Tamara</creator><creator>Wayne, Greg</creator><creator>Wong, Nathaniel</creator><creator>Yan, Chen</creator><creator>Zhu, Rui</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211207</creationdate><title>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</title><author>DeepMind Interactive Agents Team ; Abramson, Josh ; Ahuja, Arun ; Brussee, Arthur ; Carnevale, Federico ; Cassin, Mary ; Fischer, Felix ; Georgiev, Petko ; Goldin, Alex ; Gupta, Mansi ; Harley, Tim ; Hill, Felix ; Humphreys, Peter C ; Hung, Alden ; Landon, Jessica ; Lillicrap, Timothy ; Merzic, Hamza ; Muldal, Alistair ; Santoro, Adam ; Scully, Guy ; von Glehn, Tamara ; Wayne, Greg ; Wong, Nathaniel ; Yan, Chen ; Zhu, Rui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-92c01de3ac16e639ab42847e3a07188a58b4ceb4ba3f422dbe5250c7c7e82a803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>DeepMind Interactive Agents Team</creatorcontrib><creatorcontrib>Abramson, Josh</creatorcontrib><creatorcontrib>Ahuja, Arun</creatorcontrib><creatorcontrib>Brussee, Arthur</creatorcontrib><creatorcontrib>Carnevale, Federico</creatorcontrib><creatorcontrib>Cassin, Mary</creatorcontrib><creatorcontrib>Fischer, Felix</creatorcontrib><creatorcontrib>Georgiev, Petko</creatorcontrib><creatorcontrib>Goldin, Alex</creatorcontrib><creatorcontrib>Gupta, Mansi</creatorcontrib><creatorcontrib>Harley, Tim</creatorcontrib><creatorcontrib>Hill, Felix</creatorcontrib><creatorcontrib>Humphreys, Peter C</creatorcontrib><creatorcontrib>Hung, Alden</creatorcontrib><creatorcontrib>Landon, Jessica</creatorcontrib><creatorcontrib>Lillicrap, Timothy</creatorcontrib><creatorcontrib>Merzic, Hamza</creatorcontrib><creatorcontrib>Muldal, Alistair</creatorcontrib><creatorcontrib>Santoro, Adam</creatorcontrib><creatorcontrib>Scully, Guy</creatorcontrib><creatorcontrib>von Glehn, Tamara</creatorcontrib><creatorcontrib>Wayne, Greg</creatorcontrib><creatorcontrib>Wong, Nathaniel</creatorcontrib><creatorcontrib>Yan, Chen</creatorcontrib><creatorcontrib>Zhu, Rui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>DeepMind Interactive Agents Team</au><au>Abramson, Josh</au><au>Ahuja, Arun</au><au>Brussee, Arthur</au><au>Carnevale, Federico</au><au>Cassin, Mary</au><au>Fischer, Felix</au><au>Georgiev, Petko</au><au>Goldin, Alex</au><au>Gupta, Mansi</au><au>Harley, Tim</au><au>Hill, Felix</au><au>Humphreys, Peter C</au><au>Hung, Alden</au><au>Landon, Jessica</au><au>Lillicrap, Timothy</au><au>Merzic, Hamza</au><au>Muldal, Alistair</au><au>Santoro, Adam</au><au>Scully, Guy</au><au>von Glehn, Tamara</au><au>Wayne, Greg</au><au>Wong, Nathaniel</au><au>Yan, Chen</au><au>Zhu, Rui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning</atitle><date>2021-12-07</date><risdate>2021</risdate><abstract>A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY</abstract><doi>10.48550/arxiv.2112.03763</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2112.03763
ispartof
issn
language eng
recordid cdi_arxiv_primary_2112_03763
source arXiv.org
subjects Computer Science - Learning
title Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A05%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Creating%20Multimodal%20Interactive%20Agents%20with%20Imitation%20and%20Self-Supervised%20Learning&rft.au=DeepMind%20Interactive%20Agents%20Team&rft.date=2021-12-07&rft_id=info:doi/10.48550/arxiv.2112.03763&rft_dat=%3Carxiv_GOX%3E2112_03763%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true