Unstructured document format recognition method based on image data processing

The invention discloses an unstructured document format recognition method based on image data processing, which comprises the following steps of: S1, opening and analyzing a file, and converting an unstructured document format into a picture format; s2, angle correction is carried out on the pictur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHANG DAPING, ZHOU CHUANG, JIN ZHENGLEI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator ZHANG DAPING
ZHOU CHUANG
JIN ZHENGLEI
description The invention discloses an unstructured document format recognition method based on image data processing, which comprises the following steps of: S1, opening and analyzing a file, and converting an unstructured document format into a picture format; s2, angle correction is carried out on the picture obtained in S1, and the specific process is as follows: a) Hough transform is carried out on the picture, and the straight line angle of each text line in the image is detected; according to the method, the converted picture is corrected, so that the picture is in a horizontal and vertical state, the recognition rate of the OCR text detection and recognition unit is greatly improved, text typesetting is performed on the text recognized by the recognition unit, and the consistency of recognized content and original file specifications and styles is guaranteed. 本发明公开了一种基于图像数据处理的非结构化文档格式识别方法,包括以下步骤:S1、打开文件并解析,将非结构化的文档格式转换为图片格式;S2、将S1获取到的图片进行角度校正,具体流程如下:a)对图片使用霍夫变换,检测出图像中各文本行直线角度。本发明通过将转换的图片进行矫正,使图片处于横平竖直状态,大大提高了OCR文
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN115690806A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN115690806A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN115690806A3</originalsourceid><addsrcrecordid>eNqNyrEKwjAQgOEsDqK-w_kAQotYdJSiOHXSuZzJNQ2Yu5Bc3t8OPoDTzw_f2gwvLpqr1ZrJgRNbI7HCJDmiQiYrnoMGYYikszh4Y1ng8iGiJ3CoCCmLpVIC-61ZTfgptPt1Y_b327N_HCjJSCWhJSYd-6FtT92lOTfd9fiP-QJ8OTc1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Unstructured document format recognition method based on image data processing</title><source>esp@cenet</source><creator>ZHANG DAPING ; ZHOU CHUANG ; JIN ZHENGLEI</creator><creatorcontrib>ZHANG DAPING ; ZHOU CHUANG ; JIN ZHENGLEI</creatorcontrib><description>The invention discloses an unstructured document format recognition method based on image data processing, which comprises the following steps of: S1, opening and analyzing a file, and converting an unstructured document format into a picture format; s2, angle correction is carried out on the picture obtained in S1, and the specific process is as follows: a) Hough transform is carried out on the picture, and the straight line angle of each text line in the image is detected; according to the method, the converted picture is corrected, so that the picture is in a horizontal and vertical state, the recognition rate of the OCR text detection and recognition unit is greatly improved, text typesetting is performed on the text recognized by the recognition unit, and the consistency of recognized content and original file specifications and styles is guaranteed. 本发明公开了一种基于图像数据处理的非结构化文档格式识别方法,包括以下步骤:S1、打开文件并解析,将非结构化的文档格式转换为图片格式;S2、将S1获取到的图片进行角度校正,具体流程如下:a)对图片使用霍夫变换,检测出图像中各文本行直线角度。本发明通过将转换的图片进行矫正,使图片处于横平竖直状态,大大提高了OCR文</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20230203&amp;DB=EPODOC&amp;CC=CN&amp;NR=115690806A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25562,76317</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20230203&amp;DB=EPODOC&amp;CC=CN&amp;NR=115690806A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>ZHANG DAPING</creatorcontrib><creatorcontrib>ZHOU CHUANG</creatorcontrib><creatorcontrib>JIN ZHENGLEI</creatorcontrib><title>Unstructured document format recognition method based on image data processing</title><description>The invention discloses an unstructured document format recognition method based on image data processing, which comprises the following steps of: S1, opening and analyzing a file, and converting an unstructured document format into a picture format; s2, angle correction is carried out on the picture obtained in S1, and the specific process is as follows: a) Hough transform is carried out on the picture, and the straight line angle of each text line in the image is detected; according to the method, the converted picture is corrected, so that the picture is in a horizontal and vertical state, the recognition rate of the OCR text detection and recognition unit is greatly improved, text typesetting is performed on the text recognized by the recognition unit, and the consistency of recognized content and original file specifications and styles is guaranteed. 本发明公开了一种基于图像数据处理的非结构化文档格式识别方法,包括以下步骤:S1、打开文件并解析,将非结构化的文档格式转换为图片格式;S2、将S1获取到的图片进行角度校正,具体流程如下:a)对图片使用霍夫变换,检测出图像中各文本行直线角度。本发明通过将转换的图片进行矫正,使图片处于横平竖直状态,大大提高了OCR文</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyrEKwjAQgOEsDqK-w_kAQotYdJSiOHXSuZzJNQ2Yu5Bc3t8OPoDTzw_f2gwvLpqr1ZrJgRNbI7HCJDmiQiYrnoMGYYikszh4Y1ng8iGiJ3CoCCmLpVIC-61ZTfgptPt1Y_b327N_HCjJSCWhJSYd-6FtT92lOTfd9fiP-QJ8OTc1</recordid><startdate>20230203</startdate><enddate>20230203</enddate><creator>ZHANG DAPING</creator><creator>ZHOU CHUANG</creator><creator>JIN ZHENGLEI</creator><scope>EVB</scope></search><sort><creationdate>20230203</creationdate><title>Unstructured document format recognition method based on image data processing</title><author>ZHANG DAPING ; ZHOU CHUANG ; JIN ZHENGLEI</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN115690806A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2023</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>ZHANG DAPING</creatorcontrib><creatorcontrib>ZHOU CHUANG</creatorcontrib><creatorcontrib>JIN ZHENGLEI</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>ZHANG DAPING</au><au>ZHOU CHUANG</au><au>JIN ZHENGLEI</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Unstructured document format recognition method based on image data processing</title><date>2023-02-03</date><risdate>2023</risdate><abstract>The invention discloses an unstructured document format recognition method based on image data processing, which comprises the following steps of: S1, opening and analyzing a file, and converting an unstructured document format into a picture format; s2, angle correction is carried out on the picture obtained in S1, and the specific process is as follows: a) Hough transform is carried out on the picture, and the straight line angle of each text line in the image is detected; according to the method, the converted picture is corrected, so that the picture is in a horizontal and vertical state, the recognition rate of the OCR text detection and recognition unit is greatly improved, text typesetting is performed on the text recognized by the recognition unit, and the consistency of recognized content and original file specifications and styles is guaranteed. 本发明公开了一种基于图像数据处理的非结构化文档格式识别方法,包括以下步骤:S1、打开文件并解析,将非结构化的文档格式转换为图片格式;S2、将S1获取到的图片进行角度校正,具体流程如下:a)对图片使用霍夫变换,检测出图像中各文本行直线角度。本发明通过将转换的图片进行矫正,使图片处于横平竖直状态,大大提高了OCR文</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN115690806A
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
PHYSICS
title Unstructured document format recognition method based on image data processing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T12%3A20%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=ZHANG%20DAPING&rft.date=2023-02-03&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN115690806A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true