Space-efficient Huffman codes revisited

•We reduce the space requirements for a canonical code representation.•Long codewords have a long prefix of consecutive ones.•Codewords with relatively short prefixes of consecutive ones are few in numbers.•Codewords can be distinguished by their last O(log s) bits.•Grouping codewords and compressin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing letters 2023-01, Vol.179, p.106274, Article 106274
Hauptverfasser:	Grabowski, Szymon, Köppl, Dominik
Format:	Artikel
Sprache:	eng
Schlagworte:	Canonical code Compact representation Data compression Data structures Huffman code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We reduce the space requirements for a canonical code representation.•Long codewords have a long prefix of consecutive ones.•Codewords with relatively short prefixes of consecutive ones are few in numbers.•Codewords can be distinguished by their last O(log s) bits.•Grouping codewords and compressing their prefixes lead to improved space bounds. A canonical Huffman code is an optimal prefix-free compression code whose codewords enumerated in the lexicographical order form a list of binary words in non-decreasing lengths. Gagie et al. (2015) gave a representation of this coding capable of encoding and decoding a symbol in constant worst-case time. It uses σlg⁡ℓmax+o(σ)+O(ℓmax2) bits of space, where σ and ℓmax are the alphabet size and maximum codeword length, respectively. We refine their representation to reduce the space complexity to σlg⁡ℓmax(1+o(1)) bits while preserving the constant encode and decode times. Our algorithmic idea can be applied to any canonical code.
ISSN:	0020-0190 1872-6119
DOI:	10.1016/j.ipl.2022.106274