OPTIMAL DEPLOYMENT OF EMBEDDINGS TABLES ACROSS HETEROGENEOUS MEMORY ARCHITECTURE FOR HIGH-SPEED RECOMMENDATIONS INFERENCE

Works in the literature fail to leverage embedding access patterns and memory units' access/storage capabilities, which when combined can yield high-speed heterogeneous systems by dynamically re-organizing embedding tables partitions across hardware during inference. A method and system for opt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mahajan, Chinmay Narendra, KRISHNAN, Ashwin, Singhal, Rekha, Nambiar, Manoj Karunakaran
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Works in the literature fail to leverage embedding access patterns and memory units' access/storage capabilities, which when combined can yield high-speed heterogeneous systems by dynamically re-organizing embedding tables partitions across hardware during inference. A method and system for optimal deployment of embeddings tables across heterogeneous memory architecture for high-speed recommendations inference is disclosed, which dynamically partitions and organizes embedding tables across fast memory architectures to reduce access time. Partitions are chosen to take advantage of the past access patterns of those tables to ensure that frequently accessed data is available in the fast memory most of the time. Partition and replication is used to co-optimize memory access time and resources. Dynamic organization of embedding tables changes location of embedding, hence needs an efficient mechanism to track if a required embedding is present in the fast memory with its current address for faster look-up, which is performed using spline-based learned index.