Impact of Decentralized Learning on Player Utilities in Stackelberg Games
When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To bette...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Donahue, Kate Immorlica, Nicole Jagadeesan, Meena Lucier, Brendan Slivkins, Aleksandrs |
description | When deployed in the world, a learning agent such as a recommender system or
a chatbot often repeatedly interacts with another learning agent (such as a
user) over time. In many such two-agent systems, each agent learns separately
and the rewards of the two agents are not perfectly aligned. To better
understand such cases, we examine the learning dynamics of the two-agent system
and the implications for each agent's objective. We model these systems as
Stackelberg games with decentralized learning and show that standard regret
benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case
linear regret for at least one player. To better capture these systems, we
construct a relaxed regret benchmark that is tolerant to small learning errors
by agents. We show that standard learning algorithms fail to provide sublinear
regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret
for both players with respect to these benchmarks. We further design relaxed
environments under which faster learning ($O(\sqrt{T})$) is possible.
Altogether, our results take a step towards assessing how two-agent
interactions in sequential and decentralized learning environments affect the
utility of both agents. |
doi_str_mv | 10.48550/arxiv.2403.00188 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_00188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_00188</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-20990da6ee9ceb230fd014defd454aa5c8f26a1d8868eed66f3637b3b84b2cd23</originalsourceid><addsrcrecordid>eNotz0FPwyAYgGEuHsz0B3iSP9BKgTJ6NNNtTZpo4jw3H_CxEClbKFmcv95senpvb_IQ8tCwWuq2ZU-Qv8Op5pKJmrFG61vS99MRbKEHT1_QYioZYvhBRweEnELa00Oi7xHOmOlnCTGUgDMNiX4UsF8YDeY93cCE8x258RBnvP_vguzWr7vVthreNv3qeahALXXFWdcxBwqxs2i4YN6xRjr0TrYSoLXacwWN01ppRKeUF0osjTBaGm4dFwvy-Le9WsZjDhPk83gxjVeT-AX5Dkd_</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Impact of Decentralized Learning on Player Utilities in Stackelberg Games</title><source>arXiv.org</source><creator>Donahue, Kate ; Immorlica, Nicole ; Jagadeesan, Meena ; Lucier, Brendan ; Slivkins, Aleksandrs</creator><creatorcontrib>Donahue, Kate ; Immorlica, Nicole ; Jagadeesan, Meena ; Lucier, Brendan ; Slivkins, Aleksandrs</creatorcontrib><description>When deployed in the world, a learning agent such as a recommender system or
a chatbot often repeatedly interacts with another learning agent (such as a
user) over time. In many such two-agent systems, each agent learns separately
and the rewards of the two agents are not perfectly aligned. To better
understand such cases, we examine the learning dynamics of the two-agent system
and the implications for each agent's objective. We model these systems as
Stackelberg games with decentralized learning and show that standard regret
benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case
linear regret for at least one player. To better capture these systems, we
construct a relaxed regret benchmark that is tolerant to small learning errors
by agents. We show that standard learning algorithms fail to provide sublinear
regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret
for both players with respect to these benchmarks. We further design relaxed
environments under which faster learning ($O(\sqrt{T})$) is possible.
Altogether, our results take a step towards assessing how two-agent
interactions in sequential and decentralized learning environments affect the
utility of both agents.</description><identifier>DOI: 10.48550/arxiv.2403.00188</identifier><language>eng</language><subject>Computer Science - Computer Science and Game Theory ; Computer Science - Learning</subject><creationdate>2024-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.00188$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.00188$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Donahue, Kate</creatorcontrib><creatorcontrib>Immorlica, Nicole</creatorcontrib><creatorcontrib>Jagadeesan, Meena</creatorcontrib><creatorcontrib>Lucier, Brendan</creatorcontrib><creatorcontrib>Slivkins, Aleksandrs</creatorcontrib><title>Impact of Decentralized Learning on Player Utilities in Stackelberg Games</title><description>When deployed in the world, a learning agent such as a recommender system or
a chatbot often repeatedly interacts with another learning agent (such as a
user) over time. In many such two-agent systems, each agent learns separately
and the rewards of the two agents are not perfectly aligned. To better
understand such cases, we examine the learning dynamics of the two-agent system
and the implications for each agent's objective. We model these systems as
Stackelberg games with decentralized learning and show that standard regret
benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case
linear regret for at least one player. To better capture these systems, we
construct a relaxed regret benchmark that is tolerant to small learning errors
by agents. We show that standard learning algorithms fail to provide sublinear
regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret
for both players with respect to these benchmarks. We further design relaxed
environments under which faster learning ($O(\sqrt{T})$) is possible.
Altogether, our results take a step towards assessing how two-agent
interactions in sequential and decentralized learning environments affect the
utility of both agents.</description><subject>Computer Science - Computer Science and Game Theory</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0FPwyAYgGEuHsz0B3iSP9BKgTJ6NNNtTZpo4jw3H_CxEClbKFmcv95senpvb_IQ8tCwWuq2ZU-Qv8Op5pKJmrFG61vS99MRbKEHT1_QYioZYvhBRweEnELa00Oi7xHOmOlnCTGUgDMNiX4UsF8YDeY93cCE8x258RBnvP_vguzWr7vVthreNv3qeahALXXFWdcxBwqxs2i4YN6xRjr0TrYSoLXacwWN01ppRKeUF0osjTBaGm4dFwvy-Le9WsZjDhPk83gxjVeT-AX5Dkd_</recordid><startdate>20240229</startdate><enddate>20240229</enddate><creator>Donahue, Kate</creator><creator>Immorlica, Nicole</creator><creator>Jagadeesan, Meena</creator><creator>Lucier, Brendan</creator><creator>Slivkins, Aleksandrs</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240229</creationdate><title>Impact of Decentralized Learning on Player Utilities in Stackelberg Games</title><author>Donahue, Kate ; Immorlica, Nicole ; Jagadeesan, Meena ; Lucier, Brendan ; Slivkins, Aleksandrs</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-20990da6ee9ceb230fd014defd454aa5c8f26a1d8868eed66f3637b3b84b2cd23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Science and Game Theory</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Donahue, Kate</creatorcontrib><creatorcontrib>Immorlica, Nicole</creatorcontrib><creatorcontrib>Jagadeesan, Meena</creatorcontrib><creatorcontrib>Lucier, Brendan</creatorcontrib><creatorcontrib>Slivkins, Aleksandrs</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Donahue, Kate</au><au>Immorlica, Nicole</au><au>Jagadeesan, Meena</au><au>Lucier, Brendan</au><au>Slivkins, Aleksandrs</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Impact of Decentralized Learning on Player Utilities in Stackelberg Games</atitle><date>2024-02-29</date><risdate>2024</risdate><abstract>When deployed in the world, a learning agent such as a recommender system or
a chatbot often repeatedly interacts with another learning agent (such as a
user) over time. In many such two-agent systems, each agent learns separately
and the rewards of the two agents are not perfectly aligned. To better
understand such cases, we examine the learning dynamics of the two-agent system
and the implications for each agent's objective. We model these systems as
Stackelberg games with decentralized learning and show that standard regret
benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case
linear regret for at least one player. To better capture these systems, we
construct a relaxed regret benchmark that is tolerant to small learning errors
by agents. We show that standard learning algorithms fail to provide sublinear
regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret
for both players with respect to these benchmarks. We further design relaxed
environments under which faster learning ($O(\sqrt{T})$) is possible.
Altogether, our results take a step towards assessing how two-agent
interactions in sequential and decentralized learning environments affect the
utility of both agents.</abstract><doi>10.48550/arxiv.2403.00188</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2403.00188 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2403_00188 |
source | arXiv.org |
subjects | Computer Science - Computer Science and Game Theory Computer Science - Learning |
title | Impact of Decentralized Learning on Player Utilities in Stackelberg Games |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A25%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Impact%20of%20Decentralized%20Learning%20on%20Player%20Utilities%20in%20Stackelberg%20Games&rft.au=Donahue,%20Kate&rft.date=2024-02-29&rft_id=info:doi/10.48550/arxiv.2403.00188&rft_dat=%3Carxiv_GOX%3E2403_00188%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |