Toward a data scalable solution for facilitating discovery of science resources

•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intens...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel Computing, 40(10):682-696 40(10):682-696, 2014-12, Vol.40 (10), p.682-696
Hauptverfasser:	Weaver, Jesse, Castellana, Vito Giovanni, Morari, Alessandro, Tumeo, Antonino, Purohit, Sumit, Chappell, Alan, Haglin, David, Villa, Oreste, Choudhury, Sutanay, Schuchardt, Karen, Feo, John
Format:	Artikel
Sprache:	eng
Schlagworte:	aggregation Benchmarking Clusters Data intensive Gems GMT Graph database Graphs Heterogeneity Mathematical models Metadata multithreading Query processing RDESC Scalability Science metadata semantic graph databases Semantics SGEM
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•An extended description of the RDESC use case with example metadata.•An updated description of the GEMS software stack to reflect latest state.•A more in-depth evaluation of GEMS’ ability to answer science-based queries.•A performance and scalability evaluation using the BSBM benchmark. Data-intensive science simultaneously derives from and creates the need for large quantities of data. As such, scientists increasingly need to discover and analyze new datasets from diverse sources. Beyond the sheer volume of data, issues posed by the resultant data heterogeneity are often overlooked. We postulate that heterogeneity challenges can be solved (at least in part) with the adoption of the Resource Description Framework (RDF), a graph-based data model. In turn, this requires scalable graph query systems for discovering and analyzing data. Consequently, we investigate GEMS, a graph engine for large-scale clusters. We describe the features of GEMS that make it suitable for answering graph queries and scaling to larger quantities of data. We evaluate GEMS’ ability to answer real science-based queries over real-world, curated, science metadata. We also demonstrate GEMS’ ability to scale to larger datasets using a benchmark.
ISSN:	0167-8191 1872-7336
DOI:	10.1016/j.parco.2014.08.002