Price of Safety in Linear Best Arm Identification
We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not viol...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We introduce the safe best-arm identification framework with linear feedback,
where the agent is subject to some stage-wise safety constraint that linearly
depends on an unknown parameter vector. The agent must take actions in a
conservative way so as to ensure that the safety constraint is not violated
with high probability at each round. Ways of leveraging the linear structure
for ensuring safety has been studied for regret minimization, but not for
best-arm identification to the best our knowledge. We propose a gap-based
algorithm that achieves meaningful sample complexity while ensuring the
stage-wise safety. We show that we pay an extra term in the sample complexity
due to the forced exploration phase incurred by the additional safety
constraint. Experimental illustrations are provided to justify the design of
our algorithm. |
---|---|
DOI: | 10.48550/arxiv.2309.08709 |