Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs

The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sawaki, M., Hagita, N.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 78 vol.3
container_issue
container_start_page 73
container_title
container_volume 3
creator Sawaki, M.
Hagita, N.
description The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).
doi_str_mv 10.1109/ICPR.1996.546797
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_546797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>546797</ieee_id><sourcerecordid>546797</sourcerecordid><originalsourceid>FETCH-LOGICAL-i87t-31d36abafd5d73c7f1b61f9829ab2f65a31a0af00af517343ed7bb751c70c2813</originalsourceid><addsrcrecordid>eNotUE1Lw0AUXPwAS-1dPO0fSN2XzX4dpaitFBTpvbzsvjQrdRuygeq_N7U-GGaGGebwGLsDMQcQ7mG1eP-Yg3N6riptnLlgk9JKKExl1CWbOWOFBatNaUt7xSYgFBSVVnDDZjl_ivGUslq7CaMNfQ_FPibio-jRD_GQOKbAfYsnSz3vyR92Kf4lh4a_YoeJMvFEx9xhNzZawnDayPwYh5bveuza6HHPA-W4S_mWXTe4zzT75ynbPD9tFsti_fayWjyui2jNUEgIUmONTVDBSG8aqDU0zpYO67LRCiWgwEaMUGBkJSmYujYKvBG-tCCn7P48G4lo2_XxC_uf7flF8hd5Nlpl</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Sawaki, M. ; Hagita, N.</creator><creatorcontrib>Sawaki, M. ; Hagita, N.</creatorcontrib><description>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</description><identifier>ISSN: 1051-4651</identifier><identifier>ISBN: 9780818672828</identifier><identifier>ISBN: 081867282X</identifier><identifier>EISSN: 2831-7475</identifier><identifier>DOI: 10.1109/ICPR.1996.546797</identifier><language>eng</language><publisher>IEEE</publisher><subject>Character recognition ; Degradation ; Design methodology ; Image databases ; Laboratories ; Optical character recognition software ; Optical devices ; Pixel ; Robustness ; Software libraries</subject><ispartof>Proceedings of 13th International Conference on Pattern Recognition, 1996, Vol.3, p.73-78 vol.3</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/546797$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/546797$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sawaki, M.</creatorcontrib><creatorcontrib>Hagita, N.</creatorcontrib><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><title>Proceedings of 13th International Conference on Pattern Recognition</title><addtitle>ICPR</addtitle><description>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</description><subject>Character recognition</subject><subject>Degradation</subject><subject>Design methodology</subject><subject>Image databases</subject><subject>Laboratories</subject><subject>Optical character recognition software</subject><subject>Optical devices</subject><subject>Pixel</subject><subject>Robustness</subject><subject>Software libraries</subject><issn>1051-4651</issn><issn>2831-7475</issn><isbn>9780818672828</isbn><isbn>081867282X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1996</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotUE1Lw0AUXPwAS-1dPO0fSN2XzX4dpaitFBTpvbzsvjQrdRuygeq_N7U-GGaGGebwGLsDMQcQ7mG1eP-Yg3N6riptnLlgk9JKKExl1CWbOWOFBatNaUt7xSYgFBSVVnDDZjl_ivGUslq7CaMNfQ_FPibio-jRD_GQOKbAfYsnSz3vyR92Kf4lh4a_YoeJMvFEx9xhNzZawnDayPwYh5bveuza6HHPA-W4S_mWXTe4zzT75ynbPD9tFsti_fayWjyui2jNUEgIUmONTVDBSG8aqDU0zpYO67LRCiWgwEaMUGBkJSmYujYKvBG-tCCn7P48G4lo2_XxC_uf7flF8hd5Nlpl</recordid><startdate>1996</startdate><enddate>1996</enddate><creator>Sawaki, M.</creator><creator>Hagita, N.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>1996</creationdate><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><author>Sawaki, M. ; Hagita, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i87t-31d36abafd5d73c7f1b61f9829ab2f65a31a0af00af517343ed7bb751c70c2813</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Character recognition</topic><topic>Degradation</topic><topic>Design methodology</topic><topic>Image databases</topic><topic>Laboratories</topic><topic>Optical character recognition software</topic><topic>Optical devices</topic><topic>Pixel</topic><topic>Robustness</topic><topic>Software libraries</topic><toplevel>online_resources</toplevel><creatorcontrib>Sawaki, M.</creatorcontrib><creatorcontrib>Hagita, N.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sawaki, M.</au><au>Hagita, N.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</atitle><btitle>Proceedings of 13th International Conference on Pattern Recognition</btitle><stitle>ICPR</stitle><date>1996</date><risdate>1996</risdate><volume>3</volume><spage>73</spage><epage>78 vol.3</epage><pages>73-78 vol.3</pages><issn>1051-4651</issn><eissn>2831-7475</eissn><isbn>9780818672828</isbn><isbn>081867282X</isbn><abstract>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</abstract><pub>IEEE</pub><doi>10.1109/ICPR.1996.546797</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-4651
ispartof Proceedings of 13th International Conference on Pattern Recognition, 1996, Vol.3, p.73-78 vol.3
issn 1051-4651
2831-7475
language eng
recordid cdi_ieee_primary_546797
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Character recognition
Degradation
Design methodology
Image databases
Laboratories
Optical character recognition software
Optical devices
Pixel
Robustness
Software libraries
title Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T13%3A51%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Text-line%20extraction%20and%20character%20recognition%20of%20Japanese%20newspaper%20headlines%20with%20graphical%20designs&rft.btitle=Proceedings%20of%2013th%20International%20Conference%20on%20Pattern%20Recognition&rft.au=Sawaki,%20M.&rft.date=1996&rft.volume=3&rft.spage=73&rft.epage=78%20vol.3&rft.pages=73-78%20vol.3&rft.issn=1051-4651&rft.eissn=2831-7475&rft.isbn=9780818672828&rft.isbn_list=081867282X&rft_id=info:doi/10.1109/ICPR.1996.546797&rft_dat=%3Cieee_6IE%3E546797%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=546797&rfr_iscdi=true