Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation
The proliferation of online micro-video platforms has underscored the necessity for advanced recommender systems to mitigate information overload and deliver tailored content. Despite advancements, accurately and promptly capturing dynamic user interests remains a formidable challenge. Inspired by t...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The proliferation of online micro-video platforms has underscored the
necessity for advanced recommender systems to mitigate information overload and
deliver tailored content. Despite advancements, accurately and promptly
capturing dynamic user interests remains a formidable challenge. Inspired by
the Platonic Representation Hypothesis, which posits that different data
modalities converge towards a shared statistical model of reality, we introduce
DreamUMM (Dreaming User Multi-Modal Representation), a novel approach
leveraging user historical behaviors to create real-time user representation in
a multimoda space. DreamUMM employs a closed-form solution correlating user
video preferences with multimodal similarity, hypothesizing that user interests
can be effectively represented in a unified multimodal space. Additionally, we
propose Candidate-DreamUMM for scenarios lacking recent user behavior data,
inferring interests from candidate videos alone. Extensive online A/B tests
demonstrate significant improvements in user engagement metrics, including
active days and play count. The successful deployment of DreamUMM in two
micro-video platforms with hundreds of millions of daily active users,
illustrates its practical efficacy and scalability in personalized micro-video
content delivery. Our work contributes to the ongoing exploration of
representational convergence by providing empirical evidence supporting the
potential for user interest representations to reside in a multimodal space. |
---|---|
DOI: | 10.48550/arxiv.2410.03538 |