Higher compression from the Burrows-Wheeler transform by modified sorting
Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed. |
---|---|
ISSN: | 1068-0314 2375-0359 |
DOI: | 10.1109/DCC.1998.672253 |