Data Caching for Enterprise-Grade Petabyte-Scale OLAP
With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic w...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the exponential growth of data and evolving use cases, petabyte-scale
OLAP data platforms are increasingly adopting a model that decouples compute
from storage. This shift, evident in organizations like Uber and Meta,
introduces operational challenges including massive, read-heavy I/O traffic
with potential throttling, as well as skewed and fragmented data access
patterns. Addressing these challenges, this paper introduces the Alluxio local
(edge) cache, a highly effective architectural optimization tailored for such
environments. This embeddable cache, optimized for petabyte-scale data
analytics, leverages local SSD resources to alleviate network I/O and API call
pressures, significantly improving data transfer efficiency. Integrated with
OLAP systems like Presto and storage services like HDFS, the Alluxio local
cache has demonstrated its effectiveness in handling large-scale,
enterprise-grade workloads over three years of deployment at Uber and Meta. We
share insights and operational experiences in implementing these optimizations,
providing valuable perspectives on managing modern, massive-scale OLAP
workloads. |
---|---|
DOI: | 10.48550/arxiv.2406.05962 |