Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems frontiers 2021-02, Vol.23 (1), p.165-183
Hauptverfasser: Khelil, Abdallah, Mesmoudi, Amin, Galicia, Jorge, Bellatreche, Ladjel, Hacid, Mohand-Saïd, Coquery, Emmanuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 183
container_issue 1
container_start_page 165
container_title Information systems frontiers
container_volume 23
creator Khelil, Abdallah
Mesmoudi, Amin
Galicia, Jorge
Bellatreche, Ladjel
Hacid, Mohand-Saïd
Coquery, Emmanuel
description The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.
doi_str_mv 10.1007/s10796-020-09998-z
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03185258v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2487069455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</originalsourceid><addsrcrecordid>eNp9kMtOwzAURC0EEqXwA6wssWJh8CO242VV-kCqxHttOYnTpkrjYKeI9utxCYIdq3s1OjMaDQCXBN8QjOVtIFgqgTDFCCulUrQ_AgPCJUUqIeo4_iyViDEqTsFZCGuMiaCSD8DD2G2yqqmaJZx5067g5LOtnTdd5RpomgJOvVlubNP1Suk8fMlNbbLawue7KXzaWr-Dj97lNoSYcg5OSlMHe_Fzh-BtOnkdz9HiYXY_Hi1QzjjrEOc2KSgtRMGFYTTjNhbMYqMkySy3FptUGStlwWSpjOKqzFiJcyuzUiQ5NmwIrvvclal166uN8TvtTKXno4U-aJiRlFOefpDIXvVs69371oZOr93WN7GepkkqsVAJ55GiPZV7F4K35W8swfowsu5H1nFk_T2y3kcT600hws3S-r_of1xfAqR_bQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2487069455</pqid></control><display><type>article</type><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><source>SpringerLink Journals - AutoHoldings</source><creator>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</creator><creatorcontrib>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</creatorcontrib><description>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</description><identifier>ISSN: 1387-3326</identifier><identifier>EISSN: 1572-9419</identifier><identifier>DOI: 10.1007/s10796-020-09998-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Business and Management ; Computer Science ; Control ; Data structures ; Datasets ; Fragmentation ; Indexing ; Information systems ; IT in Business ; Management of Computing and Information Systems ; Operations Research/Decision Theory ; Optimization ; Queries ; Query processing ; Resource Description Framework-RDF ; Storage ; Systems Theory</subject><ispartof>Information systems frontiers, 2021-02, Vol.23 (1), p.165-183</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</citedby><cites>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</cites><orcidid>0000-0003-1307-591X ; 0000-0001-9968-0066</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10796-020-09998-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10796-020-09998-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://hal.science/hal-03185258$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Khelil, Abdallah</creatorcontrib><creatorcontrib>Mesmoudi, Amin</creatorcontrib><creatorcontrib>Galicia, Jorge</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Hacid, Mohand-Saïd</creatorcontrib><creatorcontrib>Coquery, Emmanuel</creatorcontrib><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><title>Information systems frontiers</title><addtitle>Inf Syst Front</addtitle><description>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</description><subject>Business and Management</subject><subject>Computer Science</subject><subject>Control</subject><subject>Data structures</subject><subject>Datasets</subject><subject>Fragmentation</subject><subject>Indexing</subject><subject>Information systems</subject><subject>IT in Business</subject><subject>Management of Computing and Information Systems</subject><subject>Operations Research/Decision Theory</subject><subject>Optimization</subject><subject>Queries</subject><subject>Query processing</subject><subject>Resource Description Framework-RDF</subject><subject>Storage</subject><subject>Systems Theory</subject><issn>1387-3326</issn><issn>1572-9419</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMtOwzAURC0EEqXwA6wssWJh8CO242VV-kCqxHttOYnTpkrjYKeI9utxCYIdq3s1OjMaDQCXBN8QjOVtIFgqgTDFCCulUrQ_AgPCJUUqIeo4_iyViDEqTsFZCGuMiaCSD8DD2G2yqqmaJZx5067g5LOtnTdd5RpomgJOvVlubNP1Suk8fMlNbbLawue7KXzaWr-Dj97lNoSYcg5OSlMHe_Fzh-BtOnkdz9HiYXY_Hi1QzjjrEOc2KSgtRMGFYTTjNhbMYqMkySy3FptUGStlwWSpjOKqzFiJcyuzUiQ5NmwIrvvclal166uN8TvtTKXno4U-aJiRlFOefpDIXvVs69371oZOr93WN7GepkkqsVAJ55GiPZV7F4K35W8swfowsu5H1nFk_T2y3kcT600hws3S-r_of1xfAqR_bQ</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Khelil, Abdallah</creator><creator>Mesmoudi, Amin</creator><creator>Galicia, Jorge</creator><creator>Bellatreche, Ladjel</creator><creator>Hacid, Mohand-Saïd</creator><creator>Coquery, Emmanuel</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M1O</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0003-1307-591X</orcidid><orcidid>https://orcid.org/0000-0001-9968-0066</orcidid></search><sort><creationdate>20210201</creationdate><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><author>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Business and Management</topic><topic>Computer Science</topic><topic>Control</topic><topic>Data structures</topic><topic>Datasets</topic><topic>Fragmentation</topic><topic>Indexing</topic><topic>Information systems</topic><topic>IT in Business</topic><topic>Management of Computing and Information Systems</topic><topic>Operations Research/Decision Theory</topic><topic>Optimization</topic><topic>Queries</topic><topic>Query processing</topic><topic>Resource Description Framework-RDF</topic><topic>Storage</topic><topic>Systems Theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khelil, Abdallah</creatorcontrib><creatorcontrib>Mesmoudi, Amin</creatorcontrib><creatorcontrib>Galicia, Jorge</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Hacid, Mohand-Saïd</creatorcontrib><creatorcontrib>Coquery, Emmanuel</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Library &amp; Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Library Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Information systems frontiers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khelil, Abdallah</au><au>Mesmoudi, Amin</au><au>Galicia, Jorge</au><au>Bellatreche, Ladjel</au><au>Hacid, Mohand-Saïd</au><au>Coquery, Emmanuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</atitle><jtitle>Information systems frontiers</jtitle><stitle>Inf Syst Front</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>23</volume><issue>1</issue><spage>165</spage><epage>183</epage><pages>165-183</pages><issn>1387-3326</issn><eissn>1572-9419</eissn><abstract>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10796-020-09998-z</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-1307-591X</orcidid><orcidid>https://orcid.org/0000-0001-9968-0066</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1387-3326
ispartof Information systems frontiers, 2021-02, Vol.23 (1), p.165-183
issn 1387-3326
1572-9419
language eng
recordid cdi_hal_primary_oai_HAL_hal_03185258v1
source SpringerLink Journals - AutoHoldings
subjects Business and Management
Computer Science
Control
Data structures
Datasets
Fragmentation
Indexing
Information systems
IT in Business
Management of Computing and Information Systems
Operations Research/Decision Theory
Optimization
Queries
Query processing
Resource Description Framework-RDF
Storage
Systems Theory
title Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A48%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%20Graph%20Exploration%20and%20Fragmentation%20for%20Scalable%20RDF%20Query%20Processing&rft.jtitle=Information%20systems%20frontiers&rft.au=Khelil,%20Abdallah&rft.date=2021-02-01&rft.volume=23&rft.issue=1&rft.spage=165&rft.epage=183&rft.pages=165-183&rft.issn=1387-3326&rft.eissn=1572-9419&rft_id=info:doi/10.1007/s10796-020-09998-z&rft_dat=%3Cproquest_hal_p%3E2487069455%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2487069455&rft_id=info:pmid/&rfr_iscdi=true