A knowledge distilled attention-based latent information extraction network for sequential user behavior

When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a signi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2023, Vol.82 (1), p.1017-1043
Hauptverfasser: Huang, Ruo, McIntyre, Shelby, Song, Meina, E, Haihong, Ou, Zhonghong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a significant increase in computation time. Therefore, these models cannot meet the requirement of fast response needed in application scenarios such as online advertising. To deal with these issues, we propose a Knowledge Distilled Attention-based Latent Information Extraction Network for Sequential user behavior (KD-ALIENS). In this model structure, user and item attributes and history are utilized to model the latent information from high-order feature interactions in conjunction with user sequential historical behavior. With regard to the issues of long-distance dependency and noise, we have adopted the self-attention mechanism to learn the sequential patterns between items in a user-item interaction history. With regard to the issue of a complex model architecture which cannot meet the requirement of fast response, the use of model compression and acceleration is realized by: (a) use of a knowledge-distilled teacher and student module, wherein the complex teacher module extracts a user’s general preference from high-order feature interactions and sequential patterns of long history sequences; and (b) a sampling method to sample both the relatively long-term and short-term item histories. Experimental studies on two real-world datasets demonstrate considerable improvements for click-through rate (CTR) prediction accuracy relative to strong baseline models and also show the effectiveness of the student-model compression and acceleration for speed.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-022-12513-y