Distributed arrays: an algebra for generic distributed query processing

We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Distributed and parallel databases : an international journal 2021, Vol.39 (4), p.1009-1064
Hauptverfasser: Güting, Ralf Hartmut, Behr, Thomas, Nidzwetzki, Jan Kristof
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine . It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers . It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.
ISSN:0926-8782
1573-7578
DOI:10.1007/s10619-021-07325-2