ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System
The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have focused on optimizing model inference but have ove...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The increasing complexity of deep learning models used for calculating user
representations presents significant challenges, particularly with limited
computational resources and strict service-level agreements (SLAs). Previous
research efforts have focused on optimizing model inference but have overlooked
a critical question: is it necessary to perform user model inference for every
ad request in large-scale social networks? To address this question and these
challenges, we first analyze user access patterns at Meta and find that most
user model inferences occur within a short timeframe. T his observation reveals
a triangular relationship among model complexity, embedding freshness, and
service SLAs. Building on this insight, we designed, implemented, and evaluated
ERCache, an efficient and robust caching framework for large-scale user
representations in ads recommendation systems on social networks. ERCache
categorizes cache into direct and failover types and applies customized
settings and eviction policies for each model, effectively balancing model
complexity, embedding freshness, and service SLAs, even considering the
staleness introduced by caching. ERCache has been deployed at Meta for over six
months, supporting more than 30 ranking models while efficiently conserving
computational resources and complying with service SLA requirements. |
---|---|
DOI: | 10.48550/arxiv.2410.06497 |