Using OCR and equalization to downsample documents

Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal proc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Agazzi, O.E., Church, K.W., Gale, W.A.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 309 vol.2
container_issue
container_start_page 305
container_title
container_volume 2
creator Agazzi, O.E.
Church, K.W.
Gale, W.A.
description Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.
doi_str_mv 10.1109/ICPR.1994.576925
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_576925</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>576925</ieee_id><sourcerecordid>576925</sourcerecordid><originalsourceid>FETCH-LOGICAL-i174t-3b10556f998797b44f8ed5697b1e6e74e3dcf25627968642a67833d7e808d75e3</originalsourceid><addsrcrecordid>eNotT0tLAzEYDIhQqXsvnvIHds37S46y-CgUKsWeS9p8K5HdbG22iP56A-1cZpjDPAhZcNZwztzjsn3fNNw51WgwTugbUjmwzHJrjABmZqTK-YsVaF0se0fENsf0SdfthvoUKH6ffR___BTHRKeRhvEnZT8ceyzycB4wTfme3Ha-z1hdeU62L88f7Vu9Wr8u26dVHTmoqZZ7XlpM55wFB3ulOotBmyI5GgSFMhw6ocsuZ6xRwhuwUgbAsjeARjknD5fciIi74ykO_vS7uxyT_17_QfM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Using OCR and equalization to downsample documents</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</creator><creatorcontrib>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</creatorcontrib><description>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</description><identifier>ISBN: 9780818662706</identifier><identifier>ISBN: 0818662700</identifier><identifier>DOI: 10.1109/ICPR.1994.576925</identifier><language>eng</language><publisher>IEEE</publisher><subject>Filtering ; Gray-scale ; Low pass filters ; Maximum likelihood detection ; Nonlinear filters ; Optical character recognition software ; Signal processing ; Signal processing algorithms ; Signal resolution</subject><ispartof>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), 1994, Vol.2, p.305-309 vol.2</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/576925$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/576925$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Agazzi, O.E.</creatorcontrib><creatorcontrib>Church, K.W.</creatorcontrib><creatorcontrib>Gale, W.A.</creatorcontrib><title>Using OCR and equalization to downsample documents</title><title>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5)</title><addtitle>ICPR</addtitle><description>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</description><subject>Filtering</subject><subject>Gray-scale</subject><subject>Low pass filters</subject><subject>Maximum likelihood detection</subject><subject>Nonlinear filters</subject><subject>Optical character recognition software</subject><subject>Signal processing</subject><subject>Signal processing algorithms</subject><subject>Signal resolution</subject><isbn>9780818662706</isbn><isbn>0818662700</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1994</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotT0tLAzEYDIhQqXsvnvIHds37S46y-CgUKsWeS9p8K5HdbG22iP56A-1cZpjDPAhZcNZwztzjsn3fNNw51WgwTugbUjmwzHJrjABmZqTK-YsVaF0se0fENsf0SdfthvoUKH6ffR___BTHRKeRhvEnZT8ceyzycB4wTfme3Ha-z1hdeU62L88f7Vu9Wr8u26dVHTmoqZZ7XlpM55wFB3ulOotBmyI5GgSFMhw6ocsuZ6xRwhuwUgbAsjeARjknD5fciIi74ykO_vS7uxyT_17_QfM</recordid><startdate>1994</startdate><enddate>1994</enddate><creator>Agazzi, O.E.</creator><creator>Church, K.W.</creator><creator>Gale, W.A.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>1994</creationdate><title>Using OCR and equalization to downsample documents</title><author>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i174t-3b10556f998797b44f8ed5697b1e6e74e3dcf25627968642a67833d7e808d75e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1994</creationdate><topic>Filtering</topic><topic>Gray-scale</topic><topic>Low pass filters</topic><topic>Maximum likelihood detection</topic><topic>Nonlinear filters</topic><topic>Optical character recognition software</topic><topic>Signal processing</topic><topic>Signal processing algorithms</topic><topic>Signal resolution</topic><toplevel>online_resources</toplevel><creatorcontrib>Agazzi, O.E.</creatorcontrib><creatorcontrib>Church, K.W.</creatorcontrib><creatorcontrib>Gale, W.A.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Agazzi, O.E.</au><au>Church, K.W.</au><au>Gale, W.A.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Using OCR and equalization to downsample documents</atitle><btitle>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5)</btitle><stitle>ICPR</stitle><date>1994</date><risdate>1994</risdate><volume>2</volume><spage>305</spage><epage>309 vol.2</epage><pages>305-309 vol.2</pages><isbn>9780818662706</isbn><isbn>0818662700</isbn><abstract>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</abstract><pub>IEEE</pub><doi>10.1109/ICPR.1994.576925</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9780818662706
ispartof Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), 1994, Vol.2, p.305-309 vol.2
issn
language eng
recordid cdi_ieee_primary_576925
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Filtering
Gray-scale
Low pass filters
Maximum likelihood detection
Nonlinear filters
Optical character recognition software
Signal processing
Signal processing algorithms
Signal resolution
title Using OCR and equalization to downsample documents
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A58%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Using%20OCR%20and%20equalization%20to%20downsample%20documents&rft.btitle=Proceedings%20of%20the%2012th%20IAPR%20International%20Conference%20on%20Pattern%20Recognition,%20Vol.%203%20-%20Conference%20C:%20Signal%20Processing%20(Cat.%20No.94CH3440-5)&rft.au=Agazzi,%20O.E.&rft.date=1994&rft.volume=2&rft.spage=305&rft.epage=309%20vol.2&rft.pages=305-309%20vol.2&rft.isbn=9780818662706&rft.isbn_list=0818662700&rft_id=info:doi/10.1109/ICPR.1994.576925&rft_dat=%3Cieee_6IE%3E576925%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=576925&rfr_iscdi=true