HIERARCHICAL REINFORCEMENT LEARNING AT SCALE
The invention describes a system and a method for controlling an agent interacting with an environment to perform a task, the method comprising, at each of a plurality of first time steps from a plurality of time steps: receiving an observation characterizing a state of the environment at the first...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention describes a system and a method for controlling an agent interacting with an environment to perform a task, the method comprising, at each of a plurality of first time steps from a plurality of time steps: receiving an observation characterizing a state of the environment at the first time step; determining a goal representation for the first time step that characterizes a goal state of the environment to be reached by the agent; processing the observation and the goal representation using a low-level controller neural network to generate a low-level policy output that defines an action to be performed by the agent in response to the observation, wherein the low-level controller neural network comprises: a representation neural network configured to process the observation to generate an internal state representation of the observation, and a low-level policy head configured to process the state observation representation and the goal representation to generate the low-level policy output; and controlling the agent using the low-level policy output. |
---|