Massively Scalable Inverse Reinforcement Learning in Google Maps
Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. In this paper, we...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Inverse reinforcement learning (IRL) offers a powerful and general framework
for learning humans' latent preferences in route recommendation, yet no
approach has successfully addressed planetary-scale problems with hundreds of
millions of states and demonstration trajectories. In this paper, we introduce
scaling techniques based on graph compression, spatial parallelization, and
improved initialization conditions inspired by a connection to eigenvector
algorithms. We revisit classic IRL methods in the routing context, and make the
key observation that there exists a trade-off between the use of cheap,
deterministic planners and expensive yet robust stochastic policies. This
insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new
generalization of classic IRL algorithms that provides fine-grained control
over performance trade-offs via its planning horizon. Our contributions
culminate in a policy that achieves a 16-24% improvement in route quality at a
global scale, and to the best of our knowledge, represents the largest
published study of IRL algorithms in a real-world setting to date. We conclude
by conducting an ablation study of key components, presenting negative results
from alternative eigenvalue solvers, and identifying opportunities to further
improve scalability via IRL-specific batching strategies. |
---|---|
DOI: | 10.48550/arxiv.2305.11290 |