Higher compression from the Burrows-Wheeler transform by modified sorting

Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chapin, B., Tate, S.R.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT's sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed.
ISSN:1068-0314
2375-0359
DOI:10.1109/DCC.1998.672253