FoMo Rewards: Can we cast foundation models as reward functions?

We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lubana, Ekdeep Singh, Brehmer, Johann, de Haan, Pim, Cohen, Taco
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lubana, Ekdeep Singh Brehmer, Johann de Haan, Pim Cohen, Taco
description	We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.
doi_str_mv	10.48550/arxiv.2312.03881
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_03881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_03881</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-e9c363619854331d6420ee4afe17b84c49a249e121e954e0f94841df4fd06a943</originalsourceid><addsrcrecordid>eNotz81KAzEUBeBsXEjtA7gyLzDT3OROmrjRMlgVKoJ0P1wnNzDQTiTpj769dHR1Fudw4BPiFlSNrmnUgvL3cKq1AV0r4xxci8d1ekvyg8-UQ7mXLY3yzLKncpAxHcdAhyGNcp8C74qkIvO0lPE49pemPNyIq0i7wvP_nInt-mnbvlSb9-fXdrWpyC6hYt8bayx416AxECxqxYwUGZafDnv0pNEzaGDfIKvo0SGEiDEoSx7NTNz93U6E7isPe8o_3YXSTRTzC2aXQng</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><source>arXiv.org</source><creator>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</creator><creatorcontrib>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</creatorcontrib><description>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</description><identifier>DOI: 10.48550/arxiv.2312.03881</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.03881$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.03881$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lubana, Ekdeep Singh</creatorcontrib><creatorcontrib>Brehmer, Johann</creatorcontrib><creatorcontrib>de Haan, Pim</creatorcontrib><creatorcontrib>Cohen, Taco</creatorcontrib><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><description>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KAzEUBeBsXEjtA7gyLzDT3OROmrjRMlgVKoJ0P1wnNzDQTiTpj769dHR1Fudw4BPiFlSNrmnUgvL3cKq1AV0r4xxci8d1ekvyg8-UQ7mXLY3yzLKncpAxHcdAhyGNcp8C74qkIvO0lPE49pemPNyIq0i7wvP_nInt-mnbvlSb9-fXdrWpyC6hYt8bayx416AxECxqxYwUGZafDnv0pNEzaGDfIKvo0SGEiDEoSx7NTNz93U6E7isPe8o_3YXSTRTzC2aXQng</recordid><startdate>20231206</startdate><enddate>20231206</enddate><creator>Lubana, Ekdeep Singh</creator><creator>Brehmer, Johann</creator><creator>de Haan, Pim</creator><creator>Cohen, Taco</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231206</creationdate><title>FoMo Rewards: Can we cast foundation models as reward functions?</title><author>Lubana, Ekdeep Singh ; Brehmer, Johann ; de Haan, Pim ; Cohen, Taco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-e9c363619854331d6420ee4afe17b84c49a249e121e954e0f94841df4fd06a943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lubana, Ekdeep Singh</creatorcontrib><creatorcontrib>Brehmer, Johann</creatorcontrib><creatorcontrib>de Haan, Pim</creatorcontrib><creatorcontrib>Cohen, Taco</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lubana, Ekdeep Singh</au><au>Brehmer, Johann</au><au>de Haan, Pim</au><au>Cohen, Taco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FoMo Rewards: Can we cast foundation models as reward functions?</atitle><date>2023-12-06</date><risdate>2023</risdate><abstract>We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.</abstract><doi>10.48550/arxiv.2312.03881</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2312.03881
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2312_03881
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	FoMo Rewards: Can we cast foundation models as reward functions?
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T01%3A26%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FoMo%20Rewards:%20Can%20we%20cast%20foundation%20models%20as%20reward%20functions?&rft.au=Lubana,%20Ekdeep%20Singh&rft.date=2023-12-06&rft_id=info:doi/10.48550/arxiv.2312.03881&rft_dat=%3Carxiv_GOX%3E2312_03881%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true