Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Value iteration (VI) is a foundational dynamic programming method, important
for learning and planning in optimal control and reinforcement learning. VI
proceeds in batches, where the update to the value of each state must be
completed before the next batch of updates can begin. Completing a single batch
is prohibitively expensive if the state space is large, rendering VI
impractical for many applications. Asynchronous VI helps to address the large
state space problem by updating one state at a time, in-place and in an
arbitrary order. However, Asynchronous VI still requires a maximization over
the entire action space, making it impractical for domains with large action
space. To address this issue, we propose doubly-asynchronous value iteration
(DAVI), a new algorithm that generalizes the idea of asynchrony from states to
states and actions. More concretely, DAVI maximizes over a sampled subset of
actions that can be of any user-defined size. This simple approach of using
sampling to reduce computation maintains similarly appealing theoretical
properties to VI without the need to wait for a full sweep through the entire
action space in each update. In this paper, we show DAVI converges to the
optimal value function with probability one, converges at a near-geometric rate
with probability 1-delta, and returns a near-optimal policy in computation time
that nearly matches a previously established bound for VI. We also empirically
demonstrate DAVI's effectiveness in several experiments. |
---|---|
DOI: | 10.48550/arxiv.2207.01613 |