Distributed arrays: an algebra for generic distributed query processing
We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays...
Gespeichert in:
Veröffentlicht in: | Distributed and parallel databases : an international journal 2021, Vol.39 (4), p.1009-1064 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a simple model for distributed query processing based on the concept of a
distributed array
. Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the
distributed algebra
. The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the
basic algebra
implemented by some piece of software called the
basic engine
. It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the
master
which controls a set of basic engines called the
workers
. It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the
Secondo
system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation. |
---|---|
ISSN: | 0926-8782 1573-7578 |
DOI: | 10.1007/s10619-021-07325-2 |