HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Yuyi, Zhu, Yuanzhi, Peng, Dezhi, Zhang, Peirong, Yang, Zhenhua, Yang, Zhibo, Yao, Cong, Jin, Lianwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Yuyi
Zhu, Yuanzhi
Peng, Dezhi
Zhang, Peirong
Yang, Zhenhua
Yang, Zhibo
Yao, Cong
Jin, Lianwen
description Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.
doi_str_mv 10.48550/arxiv.2403.13761
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_13761</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_13761</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-edc1f4fb7e6d2b11ae954df469144b54addb83664bcc445d89228fa8f23c43cd3</originalsourceid><addsrcrecordid>eNotj81KxDAUhbNxIaMP4Mq8QGuT3KTp7IaijlAQpCs3JT830zBjI2nR8e2lo5tz4PBx4CPkjlUlaCmrB5PP8avkUImSiVqxa9LvI-Y2edzSHe3iYVy-cU267ia7MTpzoitgUzrSkDJ9x5yKeUwLbcc44Yy0x_NC39ClwxSXmKYbchXMacbb_96Q_umxb_dF9_r80u66wqiaFegdCxBsjcpzy5jBRoIPoBoGYCUY760WSoF1DkB63XCug9GBCwfCebEh93-3F63hM8cPk3-GVW-46IlfeqhLUQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition</title><source>arXiv.org</source><creator>Zhang, Yuyi ; Zhu, Yuanzhi ; Peng, Dezhi ; Zhang, Peirong ; Yang, Zhenhua ; Yang, Zhibo ; Yao, Cong ; Jin, Lianwen</creator><creatorcontrib>Zhang, Yuyi ; Zhu, Yuanzhi ; Peng, Dezhi ; Zhang, Peirong ; Yang, Zhenhua ; Yang, Zhibo ; Yao, Cong ; Jin, Lianwen</creatorcontrib><description>Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.</description><identifier>DOI: 10.48550/arxiv.2403.13761</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.13761$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.13761$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Yuyi</creatorcontrib><creatorcontrib>Zhu, Yuanzhi</creatorcontrib><creatorcontrib>Peng, Dezhi</creatorcontrib><creatorcontrib>Zhang, Peirong</creatorcontrib><creatorcontrib>Yang, Zhenhua</creatorcontrib><creatorcontrib>Yang, Zhibo</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Jin, Lianwen</creatorcontrib><title>HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition</title><description>Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KxDAUhbNxIaMP4Mq8QGuT3KTp7IaijlAQpCs3JT830zBjI2nR8e2lo5tz4PBx4CPkjlUlaCmrB5PP8avkUImSiVqxa9LvI-Y2edzSHe3iYVy-cU267ia7MTpzoitgUzrSkDJ9x5yKeUwLbcc44Yy0x_NC39ClwxSXmKYbchXMacbb_96Q_umxb_dF9_r80u66wqiaFegdCxBsjcpzy5jBRoIPoBoGYCUY760WSoF1DkB63XCug9GBCwfCebEh93-3F63hM8cPk3-GVW-46IlfeqhLUQ</recordid><startdate>20240320</startdate><enddate>20240320</enddate><creator>Zhang, Yuyi</creator><creator>Zhu, Yuanzhi</creator><creator>Peng, Dezhi</creator><creator>Zhang, Peirong</creator><creator>Yang, Zhenhua</creator><creator>Yang, Zhibo</creator><creator>Yao, Cong</creator><creator>Jin, Lianwen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240320</creationdate><title>HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition</title><author>Zhang, Yuyi ; Zhu, Yuanzhi ; Peng, Dezhi ; Zhang, Peirong ; Yang, Zhenhua ; Yang, Zhibo ; Yao, Cong ; Jin, Lianwen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-edc1f4fb7e6d2b11ae954df469144b54addb83664bcc445d89228fa8f23c43cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yuyi</creatorcontrib><creatorcontrib>Zhu, Yuanzhi</creatorcontrib><creatorcontrib>Peng, Dezhi</creatorcontrib><creatorcontrib>Zhang, Peirong</creatorcontrib><creatorcontrib>Yang, Zhenhua</creatorcontrib><creatorcontrib>Yang, Zhibo</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Jin, Lianwen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yuyi</au><au>Zhu, Yuanzhi</au><au>Peng, Dezhi</au><au>Zhang, Peirong</au><au>Yang, Zhenhua</au><au>Yang, Zhibo</au><au>Yao, Cong</au><au>Jin, Lianwen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition</atitle><date>2024-03-20</date><risdate>2024</risdate><abstract>Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.</abstract><doi>10.48550/arxiv.2403.13761</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2403.13761
ispartof
issn
language eng
recordid cdi_arxiv_primary_2403_13761
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T10%3A17%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HierCode:%20A%20Lightweight%20Hierarchical%20Codebook%20for%20Zero-shot%20Chinese%20Text%20Recognition&rft.au=Zhang,%20Yuyi&rft.date=2024-03-20&rft_id=info:doi/10.48550/arxiv.2403.13761&rft_dat=%3Carxiv_GOX%3E2403_13761%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true