Combining Vertex-Centric Graph Processing with SPARQL for Large-Scale RDF Data Analytics

Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support either declarative SPARQL queries, or generic graph processing, but not both. We bridge the gap by introducing Spartex, a versatile framework fo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2017-12, Vol.28 (12), p.3374-3388
Hauptverfasser: Abdelaziz, Ibrahim, Harbi, Razen, Salihoglu, Semih, Kalnis, Panos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support either declarative SPARQL queries, or generic graph processing, but not both. We bridge the gap by introducing Spartex, a versatile framework for complex RDF analytics. Spartex extends SPARQL to combine seamlessly generic graph algorithms (e.g., PageRank, Shortest Paths, etc.) with SPARQL queries. Spartex builds on existing vertex-centric graph processing frameworks, such as Graphlab or Pregel. It implements a generic SPARQL operator as a vertex-centric program that interprets SPARQL queries and executes them efficiently using a built-in optimizer. In addition, any graph algorithm implemented in the underlying vertex-centric framework, can be executed in Spartex. We present various scenarios where our framework simplifies significantly the implementation of complex RDF data analytics programs. We demonstrate that Spartex scales to datasets with billions of edges, and show that our core SPARQL engine is at least as fast as the state-of-the-art specialized RDF engines. For complex analytical tasks that combine generic graph processing with SPARQL, Spartex is at least an order of magnitude faster than existing alternatives.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2017.2720174