Compressing Chinese text files using an adaptive Huffman coding scheme and a static dictionary of character pairs
The compression method for Chinese text files proposed in this paper is based on a single pass data compression technique, adaptive Huffman coding. All Chinese text files to be compressed are modeled to contain not only ASCII characters, Chinese ideographic characters and punctuation marks, but also...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The compression method for Chinese text files proposed in this paper is based on a single pass data compression technique, adaptive Huffman coding. All Chinese text files to be compressed are modeled to contain not only ASCII characters, Chinese ideographic characters and punctuation marks, but also commonly used Chinese character pairs. The approach of using a static dictionary is employed to maintain about 3000 most frequently occurring character pairs found in general Chinese texts. This is to define the extension to the standard source alphabet in ideogram-based adaptive Huffman coding. The performance in compression ratio and CPU execution time of the proposed method is evaluated against those of the adaptive byte-oriented Huffman coding scheme, the adaptive ideogram-based Huffman coding scheme, and the adaptive LZW method. The experimental results have shown that the proposed method based on adaptive Huffman coding with an extended source alphabet yields better compression on Chinese text files. |
---|---|
DOI: | 10.1109/SICON.1993.515699 |