Safe Exploration for Efficient Policy Evaluation and Comparison
High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its stat...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | High-quality data plays a central role in ensuring the accuracy of policy
evaluation. This paper initiates the study of efficient and safe data
collection for bandit policy evaluation. We formulate the problem and
investigate its several representative variants. For each variant, we analyze
its statistical properties, derive the corresponding exploration policy, and
design an efficient algorithm for computing it. Both theoretical analysis and
experiments support the usefulness of the proposed methods. |
---|---|
DOI: | 10.48550/arxiv.2202.13234 |