Higher compression from the Burrows-Wheeler transform by modified sorting

Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chapin, B., Tate, S.R.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 532
container_title
container_volume
creator Chapin, B.
Tate, S.R.
description Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.
doi_str_mv 10.1109/DCC.1998.672253
format Conference Proceeding
fullrecord <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_672253</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>672253</ieee_id><sourcerecordid>27542866</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-63bb7a9a64201fdd953bf24dbcb23913724d781367e968c8d1afb8d2d4c4d64a3</originalsourceid><addsrcrecordid>eNotkDtPwzAYRS0eEqV0RmLyxJbiV_wYITxaqRILiDFy4s-tURIXOxXqvydSme6R7tEdLkK3lCwpJebhuaqW1Bi9lIqxkp-hGeOqLAgvzTlaGKWJplpqQaS6QDNKpJ46Kq7Qdc7fhDBCJJ2h9Spsd5BwG_t9gpxDHLBPscfjDvDTIaX4m4uvHUA3SWOyQ_Yx9bg54j664AM4nGMaw7C9QZfedhkW_zlHn68vH9Wq2Ly_ravHTREY4WMhedMoa6wUjFDvnCl545lwTdswbihXEytNuVRgpG61o9Y32jEnWuGksHyO7k-7-xR_DpDHug-5ha6zA8RDrpkqBdNSTuLdSQwAUO9T6G061qe3-B9pGlwA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>27542866</pqid></control><display><type>conference_proceeding</type><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Chapin, B. ; Tate, S.R.</creator><creatorcontrib>Chapin, B. ; Tate, S.R.</creatorcontrib><description>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</description><identifier>ISSN: 1068-0314</identifier><identifier>ISBN: 9780818684067</identifier><identifier>ISBN: 0818684062</identifier><identifier>EISSN: 2375-0359</identifier><identifier>DOI: 10.1109/DCC.1998.672253</identifier><language>eng</language><publisher>IEEE</publisher><subject>Compression algorithms ; Computer science ; Dictionaries ; Encoding ; Image coding ; Pattern matching ; Reflective binary codes ; Sorting ; Testing</subject><ispartof>DCC (Los Alamitos, Calif.), 1998, p.532</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/672253$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/672253$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chapin, B.</creatorcontrib><creatorcontrib>Tate, S.R.</creatorcontrib><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><title>DCC (Los Alamitos, Calif.)</title><addtitle>DCC</addtitle><description>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</description><subject>Compression algorithms</subject><subject>Computer science</subject><subject>Dictionaries</subject><subject>Encoding</subject><subject>Image coding</subject><subject>Pattern matching</subject><subject>Reflective binary codes</subject><subject>Sorting</subject><subject>Testing</subject><issn>1068-0314</issn><issn>2375-0359</issn><isbn>9780818684067</isbn><isbn>0818684062</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1998</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkDtPwzAYRS0eEqV0RmLyxJbiV_wYITxaqRILiDFy4s-tURIXOxXqvydSme6R7tEdLkK3lCwpJebhuaqW1Bi9lIqxkp-hGeOqLAgvzTlaGKWJplpqQaS6QDNKpJ46Kq7Qdc7fhDBCJJ2h9Spsd5BwG_t9gpxDHLBPscfjDvDTIaX4m4uvHUA3SWOyQ_Yx9bg54j664AM4nGMaw7C9QZfedhkW_zlHn68vH9Wq2Ly_ravHTREY4WMhedMoa6wUjFDvnCl545lwTdswbihXEytNuVRgpG61o9Y32jEnWuGksHyO7k-7-xR_DpDHug-5ha6zA8RDrpkqBdNSTuLdSQwAUO9T6G061qe3-B9pGlwA</recordid><startdate>1998</startdate><enddate>1998</enddate><creator>Chapin, B.</creator><creator>Tate, S.R.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>1998</creationdate><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><author>Chapin, B. ; Tate, S.R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-63bb7a9a64201fdd953bf24dbcb23913724d781367e968c8d1afb8d2d4c4d64a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1998</creationdate><topic>Compression algorithms</topic><topic>Computer science</topic><topic>Dictionaries</topic><topic>Encoding</topic><topic>Image coding</topic><topic>Pattern matching</topic><topic>Reflective binary codes</topic><topic>Sorting</topic><topic>Testing</topic><toplevel>online_resources</toplevel><creatorcontrib>Chapin, B.</creatorcontrib><creatorcontrib>Tate, S.R.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chapin, B.</au><au>Tate, S.R.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Higher compression from the Burrows-Wheeler transform by modified sorting</atitle><btitle>DCC (Los Alamitos, Calif.)</btitle><stitle>DCC</stitle><date>1998</date><risdate>1998</risdate><spage>532</spage><pages>532-</pages><issn>1068-0314</issn><eissn>2375-0359</eissn><isbn>9780818684067</isbn><isbn>0818684062</isbn><abstract>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</abstract><pub>IEEE</pub><doi>10.1109/DCC.1998.672253</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1068-0314
ispartof DCC (Los Alamitos, Calif.), 1998, p.532
issn 1068-0314
2375-0359
language eng
recordid cdi_ieee_primary_672253
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Compression algorithms
Computer science
Dictionaries
Encoding
Image coding
Pattern matching
Reflective binary codes
Sorting
Testing
title Higher compression from the Burrows-Wheeler transform by modified sorting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T18%3A41%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Higher%20compression%20from%20the%20Burrows-Wheeler%20transform%20by%20modified%20sorting&rft.btitle=DCC%20(Los%20Alamitos,%20Calif.)&rft.au=Chapin,%20B.&rft.date=1998&rft.spage=532&rft.pages=532-&rft.issn=1068-0314&rft.eissn=2375-0359&rft.isbn=9780818684067&rft.isbn_list=0818684062&rft_id=info:doi/10.1109/DCC.1998.672253&rft_dat=%3Cproquest_6IE%3E27542866%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27542866&rft_id=info:pmid/&rft_ieee_id=672253&rfr_iscdi=true