Using OCR and equalization to downsample documents

Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal proc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Agazzi, O.E., Church, K.W., Gale, W.A.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.
DOI:10.1109/ICPR.1994.576925