Resource-aware adaptive indexing for in situ visual exploration and analytics

In in situ data management scenarios, large data files, which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we study the challenges posed by the visual analysis task...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The VLDB journal 2023, Vol.32 (1), p.199-227
Hauptverfasser: Maroulis, Stavros, Bikakis, Nikos, Papastefanatos, George, Vassiliadis, Panos, Vassiliou, Yannis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In in situ data management scenarios, large data files, which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we study the challenges posed by the visual analysis tasks in in situ scenarios in the presence of memory constraints. We present an indexing scheme and adaptive query evaluation techniques, which enable efficient categorical-based group-by and filter operations, combined with 2D visual interactions , such as exploration of data points on maps or scatter plots. The indexing scheme combines a tile-based structure , which offers efficient visual exploration over the 2D plane, with a tree-based structure , which organizes a tile’s objects based on its categorical values. The index is constructed on-the-fly, resides in main memory, and is built progressively as the user explores parts of the raw file, whereas its structure and level of granularity are adjusted to the user’s exploration areas and type of analysis. To handle the cases where limited resources are available, we introduce a resource-aware index initialization mechanism , we formulate it as an NP-hard optimization problem and we propose two efficient approximation algorithms to solve it. We conduct extensive experiments using real and synthetic datasets and demonstrate that our approach reports interactive query response times (less than 0.04sec) and in most cases is more than 100 × faster and performs up to two orders of magnitude less I/O operations compared to existing solutions. The proposed methods are implemented as part of an open-source system for in situ visual exploration and analytics.
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-022-00739-z