Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation

This paper discusses the use of portfolio approaches based on bandit algorithms to optimize multicriteria decision-making in recommender systems (accuracy and diversity). While previous research has primarily focused on single-item recommendations, this study extends the research to consider the rec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-07, Vol.246, p.123151, Article 123151
Hauptverfasser:	Letard, Alexandre, Gutowski, Nicolas, Camp, Olivier, Amghar, Tassadit
Format:	Artikel
Sprache:	eng
Schlagworte:	Application of reinforcement learning Computer Science Contextual multi-armed bandit Multi-armed bandit Portfolio approach Recommendation systems Top-k recommendation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper discusses the use of portfolio approaches based on bandit algorithms to optimize multicriteria decision-making in recommender systems (accuracy and diversity). While previous research has primarily focused on single-item recommendations, this study extends the research to consider the recommendation of several items per iteration. Two methods, Multiple-play Gorthaur and Budgeted-Gorthaur, are proposed to solve the algorithm selection problem and their performances on real-world datasets are compared. Both methods provide a generalization of the Gorthaur method, which enables it to operate with any Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB) algorithm as meta-algorithm in a multi-item recommendation scenario. For Multiple-play Gorthaur, an empirical evaluation shows that the use of Thompson Sampling for algorithm selection (Gorthaur-TS) yields better results than the original EXP3 method (Gorthaur-EXP3) and the exclusive use of the optimal algorithm in the portfolio in contextual recommendation problems. Additionally, the paper includes a theoretical regret analysis based on the TS sketch proof applied for this variant of the method. Concerning Budgeted-Gorthaur, experiments show that it allows more flexibility to achieve a suitable trade-off between criteria and a broader coverage of the Pareto set of solutions, overcoming a natural limit of “a-priori” methods. Finally, this paper provides a detailed review, including pseudocodes and theoretical bounds, for all the fundamental MAB and CMAB algorithms used in this study. •Bandit literature lacks formal algorithm review, hindering clarity and comparability.•There is no silver bullet: no algorithm can be the best performer in every instance.•Recommender systems need to balance accuracy, diversity, multi-item recommendations.•Optimal algorithm balances criteria, matching decision maker’s preferred trade-off.•Dynamic selection ensures safe performance when optimal algorithm is unknown.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2024.123151