Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC
The difficulty of exploring and training online on real production systems limits the scope of real-time online data/feedback-driven decision making. The most feasible approach is to adopt offline reinforcement learning from limited trajectory samples. However, after deployment, such policies fail d...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The difficulty of exploring and training online on real production systems
limits the scope of real-time online data/feedback-driven decision making. The
most feasible approach is to adopt offline reinforcement learning from limited
trajectory samples. However, after deployment, such policies fail due to
exogenous factors that temporarily or permanently disturb/alter the transition
distribution of the assumed decision process structure induced by offline
samples. This results in critical policy failures and generalization errors in
sensitive domains like Real-Time Communication (RTC). We solve this crucial
problem of identifying robust actions in presence of domain shifts due to
unseen exogenous stochastic factors in the wild. As it is impossible to learn
generalized offline policies within the support of offline data that are robust
to these unseen exogenous disturbances, we propose a novel post-deployment
shaping of policies (Streetwise), conditioned on real-time characterization of
out-of-distribution sub-spaces. This leads to robust actions in bandwidth
estimation (BWE) of network bottlenecks in RTC and in standard benchmarks. Our
extensive experimental results on BWE and other standard offline RL benchmark
environments demonstrate a significant improvement ($\approx$ 18% on some
scenarios) in final returns wrt. end-user metrics over state-of-the-art
baselines. |
---|---|
DOI: | 10.48550/arxiv.2411.06815 |