Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs
The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a l...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 78 vol.3 |
---|---|
container_issue | |
container_start_page | 73 |
container_title | |
container_volume | 3 |
creator | Sawaki, M. Hagita, N. |
description | The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%). |
doi_str_mv | 10.1109/ICPR.1996.546797 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_546797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>546797</ieee_id><sourcerecordid>546797</sourcerecordid><originalsourceid>FETCH-LOGICAL-i87t-31d36abafd5d73c7f1b61f9829ab2f65a31a0af00af517343ed7bb751c70c2813</originalsourceid><addsrcrecordid>eNotUE1Lw0AUXPwAS-1dPO0fSN2XzX4dpaitFBTpvbzsvjQrdRuygeq_N7U-GGaGGebwGLsDMQcQ7mG1eP-Yg3N6riptnLlgk9JKKExl1CWbOWOFBatNaUt7xSYgFBSVVnDDZjl_ivGUslq7CaMNfQ_FPibio-jRD_GQOKbAfYsnSz3vyR92Kf4lh4a_YoeJMvFEx9xhNzZawnDayPwYh5bveuza6HHPA-W4S_mWXTe4zzT75ynbPD9tFsti_fayWjyui2jNUEgIUmONTVDBSG8aqDU0zpYO67LRCiWgwEaMUGBkJSmYujYKvBG-tCCn7P48G4lo2_XxC_uf7flF8hd5Nlpl</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Sawaki, M. ; Hagita, N.</creator><creatorcontrib>Sawaki, M. ; Hagita, N.</creatorcontrib><description>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</description><identifier>ISSN: 1051-4651</identifier><identifier>ISBN: 9780818672828</identifier><identifier>ISBN: 081867282X</identifier><identifier>EISSN: 2831-7475</identifier><identifier>DOI: 10.1109/ICPR.1996.546797</identifier><language>eng</language><publisher>IEEE</publisher><subject>Character recognition ; Degradation ; Design methodology ; Image databases ; Laboratories ; Optical character recognition software ; Optical devices ; Pixel ; Robustness ; Software libraries</subject><ispartof>Proceedings of 13th International Conference on Pattern Recognition, 1996, Vol.3, p.73-78 vol.3</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/546797$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/546797$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sawaki, M.</creatorcontrib><creatorcontrib>Hagita, N.</creatorcontrib><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><title>Proceedings of 13th International Conference on Pattern Recognition</title><addtitle>ICPR</addtitle><description>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</description><subject>Character recognition</subject><subject>Degradation</subject><subject>Design methodology</subject><subject>Image databases</subject><subject>Laboratories</subject><subject>Optical character recognition software</subject><subject>Optical devices</subject><subject>Pixel</subject><subject>Robustness</subject><subject>Software libraries</subject><issn>1051-4651</issn><issn>2831-7475</issn><isbn>9780818672828</isbn><isbn>081867282X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1996</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotUE1Lw0AUXPwAS-1dPO0fSN2XzX4dpaitFBTpvbzsvjQrdRuygeq_N7U-GGaGGebwGLsDMQcQ7mG1eP-Yg3N6riptnLlgk9JKKExl1CWbOWOFBatNaUt7xSYgFBSVVnDDZjl_ivGUslq7CaMNfQ_FPibio-jRD_GQOKbAfYsnSz3vyR92Kf4lh4a_YoeJMvFEx9xhNzZawnDayPwYh5bveuza6HHPA-W4S_mWXTe4zzT75ynbPD9tFsti_fayWjyui2jNUEgIUmONTVDBSG8aqDU0zpYO67LRCiWgwEaMUGBkJSmYujYKvBG-tCCn7P48G4lo2_XxC_uf7flF8hd5Nlpl</recordid><startdate>1996</startdate><enddate>1996</enddate><creator>Sawaki, M.</creator><creator>Hagita, N.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>1996</creationdate><title>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</title><author>Sawaki, M. ; Hagita, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i87t-31d36abafd5d73c7f1b61f9829ab2f65a31a0af00af517343ed7bb751c70c2813</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Character recognition</topic><topic>Degradation</topic><topic>Design methodology</topic><topic>Image databases</topic><topic>Laboratories</topic><topic>Optical character recognition software</topic><topic>Optical devices</topic><topic>Pixel</topic><topic>Robustness</topic><topic>Software libraries</topic><toplevel>online_resources</toplevel><creatorcontrib>Sawaki, M.</creatorcontrib><creatorcontrib>Hagita, N.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sawaki, M.</au><au>Hagita, N.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs</atitle><btitle>Proceedings of 13th International Conference on Pattern Recognition</btitle><stitle>ICPR</stitle><date>1996</date><risdate>1996</risdate><volume>3</volume><spage>73</spage><epage>78 vol.3</epage><pages>73-78 vol.3</pages><issn>1051-4651</issn><eissn>2831-7475</eissn><isbn>9780818672828</isbn><isbn>081867282X</isbn><abstract>The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%).</abstract><pub>IEEE</pub><doi>10.1109/ICPR.1996.546797</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-4651 |
ispartof | Proceedings of 13th International Conference on Pattern Recognition, 1996, Vol.3, p.73-78 vol.3 |
issn | 1051-4651 2831-7475 |
language | eng |
recordid | cdi_ieee_primary_546797 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Character recognition Degradation Design methodology Image databases Laboratories Optical character recognition software Optical devices Pixel Robustness Software libraries |
title | Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T13%3A51%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Text-line%20extraction%20and%20character%20recognition%20of%20Japanese%20newspaper%20headlines%20with%20graphical%20designs&rft.btitle=Proceedings%20of%2013th%20International%20Conference%20on%20Pattern%20Recognition&rft.au=Sawaki,%20M.&rft.date=1996&rft.volume=3&rft.spage=73&rft.epage=78%20vol.3&rft.pages=73-78%20vol.3&rft.issn=1051-4651&rft.eissn=2831-7475&rft.isbn=9780818672828&rft.isbn_list=081867282X&rft_id=info:doi/10.1109/ICPR.1996.546797&rft_dat=%3Cieee_6IE%3E546797%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=546797&rfr_iscdi=true |