Compression in XML search engines

The structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Natvig, Ola
Format:	Dissertation
Sprache:	eng
Schlagworte:	Komplekse datasystemer ntnudaim SIF2 datateknikk
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Natvig, Ola
description	The structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path index is large, but could be compressed. This study examines query processing with indexes encoded using well known integer coding methods VByte and PFor(delta) and methods tailored for the dewey index. Intersection queries and structural queries are evaluated. In addition to standard document level skipping, skip operations for path types are implemented and evaluated. Four extensions over plain PFor methods are proposed and tested. Path type sorting sorts dewey codes on their path types and store all deweys from one path type together. Column wise dewey storage stores the deweys in columns instead of rows. Prefix coding a well known method, is adapted to the column wise dewey storage, and dynamic column wise method chooses between row wise and column wise storage based on the compressed data. Experiments are performed on a XML collection based on Wikipedia. Queries are generated from the TREC 06 efficiency task query trace. Several different types of structural queries have been executed. Experiments show that column wise methods perform good on both intersection and structural queries. The dynamic column wise scheme is in most cases the best, and creates the smallest index. Special purpose skipping for path types makes some queries extremely fast and can be implemented with only limited storage footprint. The performance of in-memory search with multi-threaded query execution is limited by memory bandwidth.
format	Dissertation
fullrecord	<record><control><sourceid>cristin_3HK</sourceid><recordid>TN_cdi_cristin_nora_11250_250820</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>11250_250820</sourcerecordid><originalsourceid>FETCH-cristin_nora_11250_2508203</originalsourceid><addsrcrecordid>eNrjZFB0zs8tKEotLs7Mz1PIzFOI8PVRKE5NLErOUEjNS8_MSy3mYWBNS8wpTuWF0twMCm6uIc4euslFmcUlmXnxeflFifGGhkamBvFAbGFkYEyEEgAKAyRt</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>dissertation</recordtype></control><display><type>dissertation</type><title>Compression in XML search engines</title><source>NORA - Norwegian Open Research Archives</source><creator>Natvig, Ola</creator><creatorcontrib>Natvig, Ola ; Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap ; Torbjørnsen, Øystein ; Bratsberg, Svein Erik</creatorcontrib><description>The structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path index is large, but could be compressed. This study examines query processing with indexes encoded using well known integer coding methods VByte and PFor(delta) and methods tailored for the dewey index. Intersection queries and structural queries are evaluated. In addition to standard document level skipping, skip operations for path types are implemented and evaluated. Four extensions over plain PFor methods are proposed and tested. Path type sorting sorts dewey codes on their path types and store all deweys from one path type together. Column wise dewey storage stores the deweys in columns instead of rows. Prefix coding a well known method, is adapted to the column wise dewey storage, and dynamic column wise method chooses between row wise and column wise storage based on the compressed data. Experiments are performed on a XML collection based on Wikipedia. Queries are generated from the TREC 06 efficiency task query trace. Several different types of structural queries have been executed. Experiments show that column wise methods perform good on both intersection and structural queries. The dynamic column wise scheme is in most cases the best, and creates the smallest index. Special purpose skipping for path types makes some queries extremely fast and can be implemented with only limited storage footprint. The performance of in-memory search with multi-threaded query execution is limited by memory bandwidth.</description><language>eng</language><publisher>Institutt for datateknikk og informasjonsvitenskap</publisher><subject>Komplekse datasystemer ; ntnudaim ; SIF2 datateknikk</subject><creationdate>2010</creationdate><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,311,780,885,4052,26567</link.rule.ids><linktorsrc>$$Uhttp://hdl.handle.net/11250/250820$$EView_record_in_NORA$$FView_record_in_$$GNORA$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Natvig, Ola</creatorcontrib><title>Compression in XML search engines</title><description>The structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path index is large, but could be compressed. This study examines query processing with indexes encoded using well known integer coding methods VByte and PFor(delta) and methods tailored for the dewey index. Intersection queries and structural queries are evaluated. In addition to standard document level skipping, skip operations for path types are implemented and evaluated. Four extensions over plain PFor methods are proposed and tested. Path type sorting sorts dewey codes on their path types and store all deweys from one path type together. Column wise dewey storage stores the deweys in columns instead of rows. Prefix coding a well known method, is adapted to the column wise dewey storage, and dynamic column wise method chooses between row wise and column wise storage based on the compressed data. Experiments are performed on a XML collection based on Wikipedia. Queries are generated from the TREC 06 efficiency task query trace. Several different types of structural queries have been executed. Experiments show that column wise methods perform good on both intersection and structural queries. The dynamic column wise scheme is in most cases the best, and creates the smallest index. Special purpose skipping for path types makes some queries extremely fast and can be implemented with only limited storage footprint. The performance of in-memory search with multi-threaded query execution is limited by memory bandwidth.</description><subject>Komplekse datasystemer</subject><subject>ntnudaim</subject><subject>SIF2 datateknikk</subject><fulltext>true</fulltext><rsrctype>dissertation</rsrctype><creationdate>2010</creationdate><recordtype>dissertation</recordtype><sourceid>3HK</sourceid><recordid>eNrjZFB0zs8tKEotLs7Mz1PIzFOI8PVRKE5NLErOUEjNS8_MSy3mYWBNS8wpTuWF0twMCm6uIc4euslFmcUlmXnxeflFifGGhkamBvFAbGFkYEyEEgAKAyRt</recordid><startdate>2010</startdate><enddate>2010</enddate><creator>Natvig, Ola</creator><general>Institutt for datateknikk og informasjonsvitenskap</general><scope>3HK</scope></search><sort><creationdate>2010</creationdate><title>Compression in XML search engines</title><author>Natvig, Ola</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-cristin_nora_11250_2508203</frbrgroupid><rsrctype>dissertations</rsrctype><prefilter>dissertations</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Komplekse datasystemer</topic><topic>ntnudaim</topic><topic>SIF2 datateknikk</topic><toplevel>online_resources</toplevel><creatorcontrib>Natvig, Ola</creatorcontrib><collection>NORA - Norwegian Open Research Archives</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Natvig, Ola</au><format>dissertation</format><genre>dissertation</genre><ristype>THES</ristype><Advisor>Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap</Advisor><Advisor>Torbjørnsen, Øystein</Advisor><Advisor>Bratsberg, Svein Erik</Advisor><btitle>Compression in XML search engines</btitle><date>2010</date><risdate>2010</risdate><abstract>The structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path index is large, but could be compressed. This study examines query processing with indexes encoded using well known integer coding methods VByte and PFor(delta) and methods tailored for the dewey index. Intersection queries and structural queries are evaluated. In addition to standard document level skipping, skip operations for path types are implemented and evaluated. Four extensions over plain PFor methods are proposed and tested. Path type sorting sorts dewey codes on their path types and store all deweys from one path type together. Column wise dewey storage stores the deweys in columns instead of rows. Prefix coding a well known method, is adapted to the column wise dewey storage, and dynamic column wise method chooses between row wise and column wise storage based on the compressed data. Experiments are performed on a XML collection based on Wikipedia. Queries are generated from the TREC 06 efficiency task query trace. Several different types of structural queries have been executed. Experiments show that column wise methods perform good on both intersection and structural queries. The dynamic column wise scheme is in most cases the best, and creates the smallest index. Special purpose skipping for path types makes some queries extremely fast and can be implemented with only limited storage footprint. The performance of in-memory search with multi-threaded query execution is limited by memory bandwidth.</abstract><pub>Institutt for datateknikk og informasjonsvitenskap</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_cristin_nora_11250_250820
source	NORA - Norwegian Open Research Archives
subjects	Komplekse datasystemer ntnudaim SIF2 datateknikk
title	Compression in XML search engines
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A10%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-cristin_3HK&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.genre=dissertation&rft.btitle=Compression%20in%20XML%20search%20engines&rft.au=Natvig,%20Ola&rft.date=2010&rft_id=info:doi/&rft_dat=%3Ccristin_3HK%3E11250_250820%3C/cristin_3HK%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true