Distributed arrays: an algebra for generic distributed query processing

We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Distributed and parallel databases : an international journal 2021, Vol.39 (4), p.1009-1064
Hauptverfasser:	Güting, Ralf Hartmut, Behr, Thomas, Nidzwetzki, Jan Kristof
Format:	Artikel
Sprache:	eng
Schlagworte:	Algebra Algorithms Arrays Clustering Computer Science Data Structures Database Management Extensibility Information Systems Applications (incl.Internet) Memory Structures Operating Systems Queries Query languages Query processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine . It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers . It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.
ISSN:	0926-8782 1573-7578
DOI:	10.1007/s10619-021-07325-2