Online Limited Memory Neural-Linear Bandits with Likelihood Matching
We study neural-linear bandits for solving problems where {\em both} exploration and representation learning play an important role. Neural-linear bandits harnesses the representation power of Deep Neural Networks (DNNs) and combines it with efficient exploration mechanisms by leveraging uncertainty...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study neural-linear bandits for solving problems where {\em both}
exploration and representation learning play an important role. Neural-linear
bandits harnesses the representation power of Deep Neural Networks (DNNs) and
combines it with efficient exploration mechanisms by leveraging uncertainty
estimation of the model, designed for linear contextual bandits on top of the
last hidden layer. In order to mitigate the problem of representation change
during the process, new uncertainty estimations are computed using stored data
from an unlimited buffer. Nevertheless, when the amount of stored data is
limited, a phenomenon called catastrophic forgetting emerges. To alleviate
this, we propose a likelihood matching algorithm that is resilient to
catastrophic forgetting and is completely online. We applied our algorithm,
Limited Memory Neural-Linear with Likelihood Matching (NeuralLinear-LiM2) on a
variety of datasets and observed that our algorithm achieves comparable
performance to the unlimited memory approach while exhibits resilience to
catastrophic forgetting. |
---|---|
DOI: | 10.48550/arxiv.2102.03799 |