A knowledge distilled attention-based latent information extraction network for sequential user behavior
When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a signi...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2023, Vol.82 (1), p.1017-1043 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a significant increase in computation time. Therefore, these models cannot meet the requirement of fast response needed in application scenarios such as online advertising. To deal with these issues, we propose a Knowledge Distilled Attention-based Latent Information Extraction Network for Sequential user behavior (KD-ALIENS). In this model structure, user and item attributes and history are utilized to model the latent information from high-order feature interactions in conjunction with user sequential historical behavior. With regard to the issues of long-distance dependency and noise, we have adopted the self-attention mechanism to learn the sequential patterns between items in a user-item interaction history. With regard to the issue of a complex model architecture which cannot meet the requirement of fast response, the use of model compression and acceleration is realized by: (a) use of a knowledge-distilled teacher and student module, wherein the complex teacher module extracts a user’s general preference from high-order feature interactions and sequential patterns of long history sequences; and (b) a sampling method to sample both the relatively long-term and short-term item histories. Experimental studies on two real-world datasets demonstrate considerable improvements for click-through rate (CTR) prediction accuracy relative to strong baseline models and also show the effectiveness of the student-model compression and acceleration for speed. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-022-12513-y |