Using OCR and equalization to downsample documents
Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal proc...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable. |
---|---|
DOI: | 10.1109/ICPR.1994.576925 |