CIPL: Counterfactual Interactive Policy Learning to Eliminate Popularity Bias for Online Recommendation

Popularity bias, as a long-standing problem in recommender systems (RSs), has been fully considered and explored for offline recommendation systems in most existing relevant researches, but very few studies have paid attention to eliminate such bias in online interactive recommendation scenarios. Bi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2023-08, Vol.35 (12), p.17123-17136
Hauptverfasser:	Zheng, Yongsen, Qin, Jinghui, Wei, Pengxu, Chen, Ziliang, Lin, Liang
Format:	Artikel
Sprache:	eng
Schlagworte:	Counterfactual inference Electronic mail Feedback loop Inference algorithms interactive recommendation system online learning popularity bias Prediction algorithms Real-time systems Recommender systems temporal causal graph (TCG) Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Popularity bias, as a long-standing problem in recommender systems (RSs), has been fully considered and explored for offline recommendation systems in most existing relevant researches, but very few studies have paid attention to eliminate such bias in online interactive recommendation scenarios. Bias amplification will become increasingly serious over time due to the existence of feedback loop between the user and the interactive system. However, existing methods have only investigated the causal relations among different factors statically without considering temporal dependencies inherent in the online interactive recommendation system, making them difficult to be adapted to online settings. To address these problems, we propose a novel counterfactual interactive policy learning (CIPL) method to eliminate popularity bias for online recommendation. It first scrutinizes the causal relations in the interactive recommender models and formulates a novel temporal causal graph (TCG) to guide the training and counterfactual inference of the causal interactive recommendation system. Concretely, TCG is used to estimate the causal relations of item popularity on prediction score when the user interacts with the system at each time during model training. Besides, it is also used to remove the negative effect of popularity bias in the test stage. To train the causal interactive recommendation system, we formulated our CIPL by the actor-critic framework with an online interactive environment simulator. We conduct extensive experiments on three public benchmarks and the experimental results demonstrate that our proposed method can achieve the new state-of-the-art performance.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2023.3299929