IMITATION LEARNING USING A GENERATIVE PREDECESSOR NEURAL NETWORK
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network. In one aspect, a method comprises: obtaining an expert observation; processing the expert observation using a generative neural network system to...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network. In one aspect, a method comprises: obtaining an expert observation; processing the expert observation using a generative neural network system to generate a given observation - given action pair, wherein the generative neural network system has been trained to be more likely to generate a particular observation - particular action pair if performing the particular action in response to the particular observation is more likely to result in the environment later reaching the state characterized by a target observation; processing the given observation using the action selection policy neural network to generate a given action score for the given action; and adjusting the current values of the action selection policy neural network parameters to increase the given action score for the given action. |
---|