METHODS AND SYSTEMS FOR SELECTING ACTIONS FROM A SET OF ACTIONS TO BE PERFORMED IN AN ENVIRONMENT AFFECTED BY DELAYS
A method of selecting an action from a plurality of actions to be performed in an environment comprises maintaining. for each action, count data indicative of a number of times the action has been performed and a difference between the number of times and a number of observed resulting rewards for t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method of selecting an action from a plurality of actions to be performed in an environment comprises maintaining. for each action, count data indicative of a number of times the action has been performed and a difference between the number of times and a number of observed resulting rewards for the action, each reward being a numeric value that measures an outcome of a given action, determining. from the count data and a bandit score provided by a bandit model. an expected score for each action, the bandit score provided by the bandit model for a given history of performed actions and observed rewards. and the expected score determined by determining an expected value of the bandit score given a likelihood of some of the actions having unobserved pending rewards. and selecting the action from the actions and based on the expected score for each action. |
---|