In-database connected component analysis

We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2018-02
Hauptverfasser:	Bögeholz, Harald, Brand, Michael, Radu-Alexandru Todor
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data management Parallel processing Query languages Relational data bases Run time (computers)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any \(\epsilon>0\) and any input graph \(G=\langle V, E \rangle\) the algorithm terminates after \(\mathop{\text{O}}(\log \|V\|)\) SQL queries with probability of at least \(1-\epsilon\), which we show empirically to translate to a quasi-linear runtime in practice.
ISSN:	2331-8422