FoMo Rewards: Can we cast foundation models as reward functions?

We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lubana, Ekdeep Singh, Brehmer, Johann, de Haan, Pim, Cohen, Taco
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lubana, Ekdeep Singh
Brehmer, Johann
de Haan, Pim
Cohen, Taco
description We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.
doi_str_mv 10.48550/arxiv.2312.03881
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_03881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_03881</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-e9c363619854331d6420ee4afe17b84c49a249e121e954e0f94841df4fd06a943</originalsourceid><addsrcrecordid>eNotz81KAzEUBeBsXEjtA7gyLzDT3OROmrjRMlgVKoJ0P1wnNzDQTiTpj769dHR1Fudw4BPiFlSNrmnUgvL3cKq1AV0r4xxci8d1ekvyg8-UQ7mXLY3yzLKncpAxHcdAhyGNcp8C74qkIvO0lPE49pemPNyIq0i7wvP_nInt-mnbvlSb9-fXdrWpyC6hYt8bayx416AxECxqxYwUGZafDnv0pNEzaGDfIKvo0SGEiDEoSx7NTNz93U6E7isPe8o_3YXSTRTzC2aXQng</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><source>arXiv.org</source><creator>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</creator><creatorcontrib>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</creatorcontrib><description>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</description><identifier>DOI: 10.48550/arxiv.2312.03881</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.03881$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.03881$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lubana, Ekdeep Singh</creatorcontrib><creatorcontrib>Brehmer, Johann</creatorcontrib><creatorcontrib>de Haan, Pim</creatorcontrib><creatorcontrib>Cohen, Taco</creatorcontrib><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><description>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KAzEUBeBsXEjtA7gyLzDT3OROmrjRMlgVKoJ0P1wnNzDQTiTpj769dHR1Fudw4BPiFlSNrmnUgvL3cKq1AV0r4xxci8d1ekvyg8-UQ7mXLY3yzLKncpAxHcdAhyGNcp8C74qkIvO0lPE49pemPNyIq0i7wvP_nInt-mnbvlSb9-fXdrWpyC6hYt8bayx416AxECxqxYwUGZafDnv0pNEzaGDfIKvo0SGEiDEoSx7NTNz93U6E7isPe8o_3YXSTRTzC2aXQng</recordid><startdate>20231206</startdate><enddate>20231206</enddate><creator>Lubana, Ekdeep Singh</creator><creator>Brehmer, Johann</creator><creator>de Haan, Pim</creator><creator>Cohen, Taco</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231206</creationdate><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><author>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-e9c363619854331d6420ee4afe17b84c49a249e121e954e0f94841df4fd06a943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lubana, Ekdeep Singh</creatorcontrib><creatorcontrib>Brehmer, Johann</creatorcontrib><creatorcontrib>de Haan, Pim</creatorcontrib><creatorcontrib>Cohen, Taco</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lubana, Ekdeep Singh</au><au>Brehmer, Johann</au><au>de Haan, Pim</au><au>Cohen, Taco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FoMo Rewards: Can we cast foundation models as reward functions?</atitle><date>2023-12-06</date><risdate>2023</risdate><abstract>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</abstract><doi>10.48550/arxiv.2312.03881</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2312.03881
ispartof
issn
language eng
recordid cdi_arxiv_primary_2312_03881
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title FoMo Rewards: Can we cast foundation models as reward functions?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T01%3A26%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FoMo%20Rewards:%20Can%20we%20cast%20foundation%20models%20as%20reward%20functions?&rft.au=Lubana,%20Ekdeep%20Singh&rft.date=2023-12-06&rft_id=info:doi/10.48550/arxiv.2312.03881&rft_dat=%3Carxiv_GOX%3E2312_03881%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true