Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats
The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must u...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The proliferation of modern data processing tools has given rise to
open-source columnar data formats. The advantage of these formats is that they
help organizations avoid repeatedly converting data to a new format for each
application. These formats, however, are read-only, and organizations must use
a heavy-weight transformation process to load data from on-line transactional
processing (OLTP) systems. We aim to reduce or even eliminate this process by
developing a storage architecture for in-memory database management systems
(DBMSs) that is aware of the eventual usage of its data and emits columnar
storage blocks in a universal open-source format. We introduce relaxations to
common analytical data formats to efficiently update records and rely on a
lightweight transformation process to convert blocks to a read-optimized layout
when they are cold. We also describe how to access data from third-party
analytical tools with minimal serialization overhead. To evaluate our work, we
implemented our storage engine based on the Apache Arrow format and integrated
it into the DB-X DBMS. Our experiments show that our approach achieves
comparable performance with dedicated OLTP DBMSs while enabling
orders-of-magnitude faster data exports to external data science and machine
learning tools than existing methods. |
---|---|
DOI: | 10.48550/arxiv.2004.14471 |