Clustering of conversational bandits with posterior sampling for user preference learning and elicitation

Conversational recommender systems elicit user preference via conversational interactions. By introducing conversational key-terms, existing conversational recommenders can effectively reduce the need for extensive exploration required by a traditional interactive recommender. However, there are sti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:User modeling and user-adapted interaction 2023-11, Vol.33 (5), p.1065-1112
Hauptverfasser: Li, Qizhi, Zhao, Canzhe, Yu, Tong, Wu, Junda, Li, Shuai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Conversational recommender systems elicit user preference via conversational interactions. By introducing conversational key-terms, existing conversational recommenders can effectively reduce the need for extensive exploration required by a traditional interactive recommender. However, there are still limitations of existing conversational recommender approaches eliciting user preference via key-terms. First, the key-term data of the items needs to be carefully labeled, which requires a lot of human efforts. Second, the number of the human labeled key-terms is limited and the granularity of the key-terms is fixed, while the elicited user preference is usually from coarse-grained to fine-grained during the conversations. In this paper, we propose a clustering of conversational bandits algorithm. To avoid the human labeling efforts and automatically learn the key-terms with the proper granularity, we online cluster the items and generate meaningful key-terms for the items during the conversational interactions. Our algorithm is general and can also be used in the user clustering when the feedback from multiple users is available, which further leads to more accurate learning and generations of conversational key-terms. Moreover, to learn the user clustering structure more efficiently in more complex user clustering structure, we further propose a simple yet effective soft user clustering module to perform exploration on user clustering via sampling the posterior user representations. We analyze the regret bound of our learning algorithm. In the empirical evaluations, without using any human labeled key-terms, our algorithm effectively generates meaningful coarse-to-fine grained key-terms and performs as well as or better than the state-of-the-art baseline.
ISSN:0924-1868
1573-1391
DOI:10.1007/s11257-023-09358-x