Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the de...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems frontiers 2021-02, Vol.23 (1), p.165-183
Hauptverfasser:	Khelil, Abdallah, Mesmoudi, Amin, Galicia, Jorge, Bellatreche, Ladjel, Hacid, Mohand-Saïd, Coquery, Emmanuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Business and Management Computer Science Control Data structures Datasets Fragmentation Indexing Information systems IT in Business Management of Computing and Information Systems Operations Research/Decision Theory Optimization Queries Query processing Resource Description Framework-RDF Storage Systems Theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	183
container_issue	1
container_start_page	165
container_title	Information systems frontiers
container_volume	23
creator	Khelil, Abdallah Mesmoudi, Amin Galicia, Jorge Bellatreche, Ladjel Hacid, Mohand-Saïd Coquery, Emmanuel
description	The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.
doi_str_mv	10.1007/s10796-020-09998-z
format	Article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03185258v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2487069455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</originalsourceid><addsrcrecordid>eNp9kMtOwzAURC0EEqXwA6wssWJh8CO242VV-kCqxHttOYnTpkrjYKeI9utxCYIdq3s1OjMaDQCXBN8QjOVtIFgqgTDFCCulUrQ_AgPCJUUqIeo4_iyViDEqTsFZCGuMiaCSD8DD2G2yqqmaJZx5067g5LOtnTdd5RpomgJOvVlubNP1Suk8fMlNbbLawue7KXzaWr-Dj97lNoSYcg5OSlMHe_Fzh-BtOnkdz9HiYXY_Hi1QzjjrEOc2KSgtRMGFYTTjNhbMYqMkySy3FptUGStlwWSpjOKqzFiJcyuzUiQ5NmwIrvvclal166uN8TvtTKXno4U-aJiRlFOefpDIXvVs69371oZOr93WN7GepkkqsVAJ55GiPZV7F4K35W8swfowsu5H1nFk_T2y3kcT600hws3S-r_of1xfAqR_bQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2487069455</pqid></control><display><type>article</type><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><source>SpringerLink Journals - AutoHoldings</source><creator>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</creator><creatorcontrib>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</creatorcontrib><description>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</description><identifier>ISSN: 1387-3326</identifier><identifier>EISSN: 1572-9419</identifier><identifier>DOI: 10.1007/s10796-020-09998-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Business and Management ; Computer Science ; Control ; Data structures ; Datasets ; Fragmentation ; Indexing ; Information systems ; IT in Business ; Management of Computing and Information Systems ; Operations Research/Decision Theory ; Optimization ; Queries ; Query processing ; Resource Description Framework-RDF ; Storage ; Systems Theory</subject><ispartof>Information systems frontiers, 2021-02, Vol.23 (1), p.165-183</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</citedby><cites>FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</cites><orcidid>0000-0003-1307-591X ; 0000-0001-9968-0066</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10796-020-09998-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10796-020-09998-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://hal.science/hal-03185258$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Khelil, Abdallah</creatorcontrib><creatorcontrib>Mesmoudi, Amin</creatorcontrib><creatorcontrib>Galicia, Jorge</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Hacid, Mohand-Saïd</creatorcontrib><creatorcontrib>Coquery, Emmanuel</creatorcontrib><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><title>Information systems frontiers</title><addtitle>Inf Syst Front</addtitle><description>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</description><subject>Business and Management</subject><subject>Computer Science</subject><subject>Control</subject><subject>Data structures</subject><subject>Datasets</subject><subject>Fragmentation</subject><subject>Indexing</subject><subject>Information systems</subject><subject>IT in Business</subject><subject>Management of Computing and Information Systems</subject><subject>Operations Research/Decision Theory</subject><subject>Optimization</subject><subject>Queries</subject><subject>Query processing</subject><subject>Resource Description Framework-RDF</subject><subject>Storage</subject><subject>Systems Theory</subject><issn>1387-3326</issn><issn>1572-9419</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMtOwzAURC0EEqXwA6wssWJh8CO242VV-kCqxHttOYnTpkrjYKeI9utxCYIdq3s1OjMaDQCXBN8QjOVtIFgqgTDFCCulUrQ_AgPCJUUqIeo4_iyViDEqTsFZCGuMiaCSD8DD2G2yqqmaJZx5067g5LOtnTdd5RpomgJOvVlubNP1Suk8fMlNbbLawue7KXzaWr-Dj97lNoSYcg5OSlMHe_Fzh-BtOnkdz9HiYXY_Hi1QzjjrEOc2KSgtRMGFYTTjNhbMYqMkySy3FptUGStlwWSpjOKqzFiJcyuzUiQ5NmwIrvvclal166uN8TvtTKXno4U-aJiRlFOefpDIXvVs69371oZOr93WN7GepkkqsVAJ55GiPZV7F4K35W8swfowsu5H1nFk_T2y3kcT600hws3S-r_of1xfAqR_bQ</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Khelil, Abdallah</creator><creator>Mesmoudi, Amin</creator><creator>Galicia, Jorge</creator><creator>Bellatreche, Ladjel</creator><creator>Hacid, Mohand-Saïd</creator><creator>Coquery, Emmanuel</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M1O</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0003-1307-591X</orcidid><orcidid>https://orcid.org/0000-0001-9968-0066</orcidid></search><sort><creationdate>20210201</creationdate><title>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</title><author>Khelil, Abdallah ; Mesmoudi, Amin ; Galicia, Jorge ; Bellatreche, Ladjel ; Hacid, Mohand-Saïd ; Coquery, Emmanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-55e4d22d6d56a32b5e138b27544be5ee0a89ae77d37f9a959fb3f0ce7bf64c0a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Business and Management</topic><topic>Computer Science</topic><topic>Control</topic><topic>Data structures</topic><topic>Datasets</topic><topic>Fragmentation</topic><topic>Indexing</topic><topic>Information systems</topic><topic>IT in Business</topic><topic>Management of Computing and Information Systems</topic><topic>Operations Research/Decision Theory</topic><topic>Optimization</topic><topic>Queries</topic><topic>Query processing</topic><topic>Resource Description Framework-RDF</topic><topic>Storage</topic><topic>Systems Theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khelil, Abdallah</creatorcontrib><creatorcontrib>Mesmoudi, Amin</creatorcontrib><creatorcontrib>Galicia, Jorge</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Hacid, Mohand-Saïd</creatorcontrib><creatorcontrib>Coquery, Emmanuel</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Library & Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Library Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Information systems frontiers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khelil, Abdallah</au><au>Mesmoudi, Amin</au><au>Galicia, Jorge</au><au>Bellatreche, Ladjel</au><au>Hacid, Mohand-Saïd</au><au>Coquery, Emmanuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing</atitle><jtitle>Information systems frontiers</jtitle><stitle>Inf Syst Front</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>23</volume><issue>1</issue><spage>165</spage><epage>183</epage><pages>165-183</pages><issn>1387-3326</issn><eissn>1572-9419</eissn><abstract>The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10796-020-09998-z</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-1307-591X</orcidid><orcidid>https://orcid.org/0000-0001-9968-0066</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1387-3326
ispartof	Information systems frontiers, 2021-02, Vol.23 (1), p.165-183
issn	1387-3326 1572-9419
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_03185258v1
source	SpringerLink Journals - AutoHoldings
subjects	Business and Management Computer Science Control Data structures Datasets Fragmentation Indexing Information systems IT in Business Management of Computing and Information Systems Operations Research/Decision Theory Optimization Queries Query processing Resource Description Framework-RDF Storage Systems Theory
title	Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A48%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%20Graph%20Exploration%20and%20Fragmentation%20for%20Scalable%20RDF%20Query%20Processing&rft.jtitle=Information%20systems%20frontiers&rft.au=Khelil,%20Abdallah&rft.date=2021-02-01&rft.volume=23&rft.issue=1&rft.spage=165&rft.epage=183&rft.pages=165-183&rft.issn=1387-3326&rft.eissn=1572-9419&rft_id=info:doi/10.1007/s10796-020-09998-z&rft_dat=%3Cproquest_hal_p%3E2487069455%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2487069455&rft_id=info:pmid/&rfr_iscdi=true