Using OCR and equalization to downsample documents
Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal proc...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 309 vol.2 |
---|---|
container_issue | |
container_start_page | 305 |
container_title | |
container_volume | 2 |
creator | Agazzi, O.E. Church, K.W. Gale, W.A. |
description | Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable. |
doi_str_mv | 10.1109/ICPR.1994.576925 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_576925</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>576925</ieee_id><sourcerecordid>576925</sourcerecordid><originalsourceid>FETCH-LOGICAL-i174t-3b10556f998797b44f8ed5697b1e6e74e3dcf25627968642a67833d7e808d75e3</originalsourceid><addsrcrecordid>eNotT0tLAzEYDIhQqXsvnvIHds37S46y-CgUKsWeS9p8K5HdbG22iP56A-1cZpjDPAhZcNZwztzjsn3fNNw51WgwTugbUjmwzHJrjABmZqTK-YsVaF0se0fENsf0SdfthvoUKH6ffR___BTHRKeRhvEnZT8ceyzycB4wTfme3Ha-z1hdeU62L88f7Vu9Wr8u26dVHTmoqZZ7XlpM55wFB3ulOotBmyI5GgSFMhw6ocsuZ6xRwhuwUgbAsjeARjknD5fciIi74ykO_vS7uxyT_17_QfM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Using OCR and equalization to downsample documents</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</creator><creatorcontrib>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</creatorcontrib><description>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</description><identifier>ISBN: 9780818662706</identifier><identifier>ISBN: 0818662700</identifier><identifier>DOI: 10.1109/ICPR.1994.576925</identifier><language>eng</language><publisher>IEEE</publisher><subject>Filtering ; Gray-scale ; Low pass filters ; Maximum likelihood detection ; Nonlinear filters ; Optical character recognition software ; Signal processing ; Signal processing algorithms ; Signal resolution</subject><ispartof>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), 1994, Vol.2, p.305-309 vol.2</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/576925$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/576925$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Agazzi, O.E.</creatorcontrib><creatorcontrib>Church, K.W.</creatorcontrib><creatorcontrib>Gale, W.A.</creatorcontrib><title>Using OCR and equalization to downsample documents</title><title>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5)</title><addtitle>ICPR</addtitle><description>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</description><subject>Filtering</subject><subject>Gray-scale</subject><subject>Low pass filters</subject><subject>Maximum likelihood detection</subject><subject>Nonlinear filters</subject><subject>Optical character recognition software</subject><subject>Signal processing</subject><subject>Signal processing algorithms</subject><subject>Signal resolution</subject><isbn>9780818662706</isbn><isbn>0818662700</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1994</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotT0tLAzEYDIhQqXsvnvIHds37S46y-CgUKsWeS9p8K5HdbG22iP56A-1cZpjDPAhZcNZwztzjsn3fNNw51WgwTugbUjmwzHJrjABmZqTK-YsVaF0se0fENsf0SdfthvoUKH6ffR___BTHRKeRhvEnZT8ceyzycB4wTfme3Ha-z1hdeU62L88f7Vu9Wr8u26dVHTmoqZZ7XlpM55wFB3ulOotBmyI5GgSFMhw6ocsuZ6xRwhuwUgbAsjeARjknD5fciIi74ykO_vS7uxyT_17_QfM</recordid><startdate>1994</startdate><enddate>1994</enddate><creator>Agazzi, O.E.</creator><creator>Church, K.W.</creator><creator>Gale, W.A.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>1994</creationdate><title>Using OCR and equalization to downsample documents</title><author>Agazzi, O.E. ; Church, K.W. ; Gale, W.A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i174t-3b10556f998797b44f8ed5697b1e6e74e3dcf25627968642a67833d7e808d75e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1994</creationdate><topic>Filtering</topic><topic>Gray-scale</topic><topic>Low pass filters</topic><topic>Maximum likelihood detection</topic><topic>Nonlinear filters</topic><topic>Optical character recognition software</topic><topic>Signal processing</topic><topic>Signal processing algorithms</topic><topic>Signal resolution</topic><toplevel>online_resources</toplevel><creatorcontrib>Agazzi, O.E.</creatorcontrib><creatorcontrib>Church, K.W.</creatorcontrib><creatorcontrib>Gale, W.A.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Agazzi, O.E.</au><au>Church, K.W.</au><au>Gale, W.A.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Using OCR and equalization to downsample documents</atitle><btitle>Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5)</btitle><stitle>ICPR</stitle><date>1994</date><risdate>1994</risdate><volume>2</volume><spage>305</spage><epage>309 vol.2</epage><pages>305-309 vol.2</pages><isbn>9780818662706</isbn><isbn>0818662700</isbn><abstract>Documents need to be sampled at different rates for different output devices: 300-600 dpi for laser printers, 100-200 dpi for fax, and 75-100 dpi for bitmap terminals. To output a high resolution document on a low resolution device, it may be necessary to introduce downsampling. Standard signal processing techniques such as linear filtering and decimation don't work very well at low resolutions. Better results are obtained by a nonlinear filtering technique we introduce in this paper, called nonlinear document equalization. Even better results are obtained by taking advantage of fonts designed specifically for bitmap terminals and other low resolution devices. However, character-level information is required to make use of fonts. This information is not always available; OCR is not 100% accurate. We propose a hybrid approach: downsample by font substitution when possible, and decimate when necessary. Unfortunately, the result tends to look like a "ransom note". Equalization is used to blend the two cases together so that gaps in the OCR analysis become almost unnoticeable.</abstract><pub>IEEE</pub><doi>10.1109/ICPR.1994.576925</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9780818662706 |
ispartof | Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), 1994, Vol.2, p.305-309 vol.2 |
issn | |
language | eng |
recordid | cdi_ieee_primary_576925 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Filtering Gray-scale Low pass filters Maximum likelihood detection Nonlinear filters Optical character recognition software Signal processing Signal processing algorithms Signal resolution |
title | Using OCR and equalization to downsample documents |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A58%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Using%20OCR%20and%20equalization%20to%20downsample%20documents&rft.btitle=Proceedings%20of%20the%2012th%20IAPR%20International%20Conference%20on%20Pattern%20Recognition,%20Vol.%203%20-%20Conference%20C:%20Signal%20Processing%20(Cat.%20No.94CH3440-5)&rft.au=Agazzi,%20O.E.&rft.date=1994&rft.volume=2&rft.spage=305&rft.epage=309%20vol.2&rft.pages=305-309%20vol.2&rft.isbn=9780818662706&rft.isbn_list=0818662700&rft_id=info:doi/10.1109/ICPR.1994.576925&rft_dat=%3Cieee_6IE%3E576925%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=576925&rfr_iscdi=true |