Conservative Contextual Combinatorial Cascading Bandit
Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the c...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conservative mechanism is a desirable property in decision-making problems
which balance the tradeoff between the exploration and exploitation. We propose
the novel \emph{conservative contextual combinatorial cascading bandit
($C^4$-bandit)}, a cascading online learning game which incorporates the
conservative mechanism. At each time step, the learning agent is given some
contexts and has to recommend a list of items but not worse than the base
strategy and then observes the reward by some stopping rules. We design the
$C^4$-UCB algorithm to solve the problem and prove its n-step upper regret
bound for two situations: known baseline reward and unknown baseline reward.
The regret in both situations can be decomposed into two terms: (a) the upper
bound for the general contextual combinatorial cascading bandit; and (b) a
constant term for the regret from the conservative mechanism. We also improve
the bound of the conservative contextual combinatorial bandit as a by-product.
Experiments on synthetic data demonstrate its advantages and validate our
theoretical analysis. |
---|---|
DOI: | 10.48550/arxiv.2104.08615 |