Higher compression from the Burrows-Wheeler transform by modified sorting
Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 532 |
container_title | |
container_volume | |
creator | Chapin, B. Tate, S.R. |
description | Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed. |
doi_str_mv | 10.1109/DCC.1998.672253 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_672253</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>672253</ieee_id><sourcerecordid>27542866</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-63bb7a9a64201fdd953bf24dbcb23913724d781367e968c8d1afb8d2d4c4d64a3</originalsourceid><addsrcrecordid>eNotkDtPwzAYRS0eEqV0RmLyxJbiV_wYITxaqRILiDFy4s-tURIXOxXqvydSme6R7tEdLkK3lCwpJebhuaqW1Bi9lIqxkp-hGeOqLAgvzTlaGKWJplpqQaS6QDNKpJ46Kq7Qdc7fhDBCJJ2h9Spsd5BwG_t9gpxDHLBPscfjDvDTIaX4m4uvHUA3SWOyQ_Yx9bg54j664AM4nGMaw7C9QZfedhkW_zlHn68vH9Wq2Ly_ravHTREY4WMhedMoa6wUjFDvnCl545lwTdswbihXEytNuVRgpG61o9Y32jEnWuGksHyO7k-7-xR_DpDHug-5ha6zA8RDrpkqBdNSTuLdSQwAUO9T6G061qe3-B9pGlwA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>27542866</pqid></control><display><type>conference_proceeding</type><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Chapin, B. ; Tate, S.R.</creator><creatorcontrib>Chapin, B. ; Tate, S.R.</creatorcontrib><description>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</description><identifier>ISSN: 1068-0314</identifier><identifier>ISBN: 9780818684067</identifier><identifier>ISBN: 0818684062</identifier><identifier>EISSN: 2375-0359</identifier><identifier>DOI: 10.1109/DCC.1998.672253</identifier><language>eng</language><publisher>IEEE</publisher><subject>Compression algorithms ; Computer science ; Dictionaries ; Encoding ; Image coding ; Pattern matching ; Reflective binary codes ; Sorting ; Testing</subject><ispartof>DCC (Los Alamitos, Calif.), 1998, p.532</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/672253$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/672253$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chapin, B.</creatorcontrib><creatorcontrib>Tate, S.R.</creatorcontrib><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><title>DCC (Los Alamitos, Calif.)</title><addtitle>DCC</addtitle><description>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</description><subject>Compression algorithms</subject><subject>Computer science</subject><subject>Dictionaries</subject><subject>Encoding</subject><subject>Image coding</subject><subject>Pattern matching</subject><subject>Reflective binary codes</subject><subject>Sorting</subject><subject>Testing</subject><issn>1068-0314</issn><issn>2375-0359</issn><isbn>9780818684067</isbn><isbn>0818684062</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1998</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkDtPwzAYRS0eEqV0RmLyxJbiV_wYITxaqRILiDFy4s-tURIXOxXqvydSme6R7tEdLkK3lCwpJebhuaqW1Bi9lIqxkp-hGeOqLAgvzTlaGKWJplpqQaS6QDNKpJ46Kq7Qdc7fhDBCJJ2h9Spsd5BwG_t9gpxDHLBPscfjDvDTIaX4m4uvHUA3SWOyQ_Yx9bg54j664AM4nGMaw7C9QZfedhkW_zlHn68vH9Wq2Ly_ravHTREY4WMhedMoa6wUjFDvnCl545lwTdswbihXEytNuVRgpG61o9Y32jEnWuGksHyO7k-7-xR_DpDHug-5ha6zA8RDrpkqBdNSTuLdSQwAUO9T6G061qe3-B9pGlwA</recordid><startdate>1998</startdate><enddate>1998</enddate><creator>Chapin, B.</creator><creator>Tate, S.R.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>1998</creationdate><title>Higher compression from the Burrows-Wheeler transform by modified sorting</title><author>Chapin, B. ; Tate, S.R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-63bb7a9a64201fdd953bf24dbcb23913724d781367e968c8d1afb8d2d4c4d64a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1998</creationdate><topic>Compression algorithms</topic><topic>Computer science</topic><topic>Dictionaries</topic><topic>Encoding</topic><topic>Image coding</topic><topic>Pattern matching</topic><topic>Reflective binary codes</topic><topic>Sorting</topic><topic>Testing</topic><toplevel>online_resources</toplevel><creatorcontrib>Chapin, B.</creatorcontrib><creatorcontrib>Tate, S.R.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chapin, B.</au><au>Tate, S.R.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Higher compression from the Burrows-Wheeler transform by modified sorting</atitle><btitle>DCC (Los Alamitos, Calif.)</btitle><stitle>DCC</stitle><date>1998</date><risdate>1998</risdate><spage>532</spage><pages>532-</pages><issn>1068-0314</issn><eissn>2375-0359</eissn><isbn>9780818684067</isbn><isbn>0818684062</isbn><abstract>Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.</abstract><pub>IEEE</pub><doi>10.1109/DCC.1998.672253</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1068-0314 |
ispartof | DCC (Los Alamitos, Calif.), 1998, p.532 |
issn | 1068-0314 2375-0359 |
language | eng |
recordid | cdi_ieee_primary_672253 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Compression algorithms Computer science Dictionaries Encoding Image coding Pattern matching Reflective binary codes Sorting Testing |
title | Higher compression from the Burrows-Wheeler transform by modified sorting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T18%3A41%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Higher%20compression%20from%20the%20Burrows-Wheeler%20transform%20by%20modified%20sorting&rft.btitle=DCC%20(Los%20Alamitos,%20Calif.)&rft.au=Chapin,%20B.&rft.date=1998&rft.spage=532&rft.pages=532-&rft.issn=1068-0314&rft.eissn=2375-0359&rft.isbn=9780818684067&rft.isbn_list=0818684062&rft_id=info:doi/10.1109/DCC.1998.672253&rft_dat=%3Cproquest_6IE%3E27542866%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27542866&rft_id=info:pmid/&rft_ieee_id=672253&rfr_iscdi=true |