METHODS AND SYSTEMS FOR SELECTING ACTIONS FROM A SET OF ACTIONS TO BE PERFORMED IN AN ENVIRONMENT AFFECTED BY DELAYS

A method of selecting an action from a plurality of actions to be performed in an environment comprises maintaining. for each action, count data indicative of a number of times the action has been performed and a difference between the number of times and a number of observed resulting rewards for t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: VARRO, Daniel, PILARSKI, Sebastian, PILARSKI, Slawomir
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method of selecting an action from a plurality of actions to be performed in an environment comprises maintaining. for each action, count data indicative of a number of times the action has been performed and a difference between the number of times and a number of observed resulting rewards for the action, each reward being a numeric value that measures an outcome of a given action, determining. from the count data and a bandit score provided by a bandit model. an expected score for each action, the bandit score provided by the bandit model for a given history of performed actions and observed rewards. and the expected score determined by determining an expected value of the bandit score given a likelihood of some of the actions having unobserved pending rewards. and selecting the action from the actions and based on the expected score for each action.