EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT

Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: KOTNALA, Rahul, VISWANATHAN, Kumar, NARAYANAN, Srikanth, GHATAGE, Prakash, MANI, Rekha, KRISHNAN, Aravind, JAIN, Ashish, SAMPAT, Nirav, LAKSHMINARAYANAN, Kameshkumar, MAHAPATRA, Suvendu Kumar
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator KOTNALA, Rahul
VISWANATHAN, Kumar
NARAYANAN, Srikanth
GHATAGE, Prakash
MANI, Rekha
KRISHNAN, Aravind
JAIN, Ashish
SAMPAT, Nirav
LAKSHMINARAYANAN, Kameshkumar
MAHAPATRA, Suvendu Kumar
description Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2018373711A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2018373711A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2018373711A13</originalsourceid><addsrcrecordid>eNrjZLB2jQgJcnQO8fRzVwh2dQxy9nB08nFV8PRz8w_ydQzx9PdTcAvy91VwVHDxdPcM8YxydVFw8XcO9XX1C-FhYE1LzClO5YXS3AzKbq4hzh66qQX58anFBYnJqXmpJfGhwUYGhhbG5sbmhoaOhsbEqQIAj5UqUg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT</title><source>esp@cenet</source><creator>KOTNALA, Rahul ; VISWANATHAN, Kumar ; NARAYANAN, Srikanth ; GHATAGE, Prakash ; MANI, Rekha ; KRISHNAN, Aravind ; JAIN, Ashish ; SAMPAT, Nirav ; LAKSHMINARAYANAN, Kameshkumar ; MAHAPATRA, Suvendu Kumar</creator><creatorcontrib>KOTNALA, Rahul ; VISWANATHAN, Kumar ; NARAYANAN, Srikanth ; GHATAGE, Prakash ; MANI, Rekha ; KRISHNAN, Aravind ; JAIN, Ashish ; SAMPAT, Nirav ; LAKSHMINARAYANAN, Kameshkumar ; MAHAPATRA, Suvendu Kumar</creatorcontrib><description>Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC COMMUNICATION TECHNIQUE ; ELECTRIC DIGITAL DATA PROCESSING ; ELECTRICITY ; HANDLING RECORD CARRIERS ; PHYSICS ; PICTORIAL COMMUNICATION, e.g. TELEVISION ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2018</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20181227&amp;DB=EPODOC&amp;CC=US&amp;NR=2018373711A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25563,76318</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20181227&amp;DB=EPODOC&amp;CC=US&amp;NR=2018373711A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>KOTNALA, Rahul</creatorcontrib><creatorcontrib>VISWANATHAN, Kumar</creatorcontrib><creatorcontrib>NARAYANAN, Srikanth</creatorcontrib><creatorcontrib>GHATAGE, Prakash</creatorcontrib><creatorcontrib>MANI, Rekha</creatorcontrib><creatorcontrib>KRISHNAN, Aravind</creatorcontrib><creatorcontrib>JAIN, Ashish</creatorcontrib><creatorcontrib>SAMPAT, Nirav</creatorcontrib><creatorcontrib>LAKSHMINARAYANAN, Kameshkumar</creatorcontrib><creatorcontrib>MAHAPATRA, Suvendu Kumar</creatorcontrib><title>EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT</title><description>Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC COMMUNICATION TECHNIQUE</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>ELECTRICITY</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PICTORIAL COMMUNICATION, e.g. TELEVISION</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2018</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLB2jQgJcnQO8fRzVwh2dQxy9nB08nFV8PRz8w_ydQzx9PdTcAvy91VwVHDxdPcM8YxydVFw8XcO9XX1C-FhYE1LzClO5YXS3AzKbq4hzh66qQX58anFBYnJqXmpJfGhwUYGhhbG5sbmhoaOhsbEqQIAj5UqUg</recordid><startdate>20181227</startdate><enddate>20181227</enddate><creator>KOTNALA, Rahul</creator><creator>VISWANATHAN, Kumar</creator><creator>NARAYANAN, Srikanth</creator><creator>GHATAGE, Prakash</creator><creator>MANI, Rekha</creator><creator>KRISHNAN, Aravind</creator><creator>JAIN, Ashish</creator><creator>SAMPAT, Nirav</creator><creator>LAKSHMINARAYANAN, Kameshkumar</creator><creator>MAHAPATRA, Suvendu Kumar</creator><scope>EVB</scope></search><sort><creationdate>20181227</creationdate><title>EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT</title><author>KOTNALA, Rahul ; VISWANATHAN, Kumar ; NARAYANAN, Srikanth ; GHATAGE, Prakash ; MANI, Rekha ; KRISHNAN, Aravind ; JAIN, Ashish ; SAMPAT, Nirav ; LAKSHMINARAYANAN, Kameshkumar ; MAHAPATRA, Suvendu Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2018373711A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2018</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC COMMUNICATION TECHNIQUE</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>ELECTRICITY</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PICTORIAL COMMUNICATION, e.g. TELEVISION</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>KOTNALA, Rahul</creatorcontrib><creatorcontrib>VISWANATHAN, Kumar</creatorcontrib><creatorcontrib>NARAYANAN, Srikanth</creatorcontrib><creatorcontrib>GHATAGE, Prakash</creatorcontrib><creatorcontrib>MANI, Rekha</creatorcontrib><creatorcontrib>KRISHNAN, Aravind</creatorcontrib><creatorcontrib>JAIN, Ashish</creatorcontrib><creatorcontrib>SAMPAT, Nirav</creatorcontrib><creatorcontrib>LAKSHMINARAYANAN, Kameshkumar</creatorcontrib><creatorcontrib>MAHAPATRA, Suvendu Kumar</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>KOTNALA, Rahul</au><au>VISWANATHAN, Kumar</au><au>NARAYANAN, Srikanth</au><au>GHATAGE, Prakash</au><au>MANI, Rekha</au><au>KRISHNAN, Aravind</au><au>JAIN, Ashish</au><au>SAMPAT, Nirav</au><au>LAKSHMINARAYANAN, Kameshkumar</au><au>MAHAPATRA, Suvendu Kumar</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT</title><date>2018-12-27</date><risdate>2018</risdate><abstract>Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2018373711A1
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC COMMUNICATION TECHNIQUE
ELECTRIC DIGITAL DATA PROCESSING
ELECTRICITY
HANDLING RECORD CARRIERS
PHYSICS
PICTORIAL COMMUNICATION, e.g. TELEVISION
PRESENTATION OF DATA
RECOGNITION OF DATA
RECORD CARRIERS
title EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T12%3A32%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=KOTNALA,%20Rahul&rft.date=2018-12-27&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2018373711A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true