EXTRACTING SEARCHABLE INFORMATION FROM A DIGITIZED DOCUMENT

Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	KOTNALA, Rahul, VISWANATHAN, Kumar, NARAYANAN, Srikanth, GHATAGE, Prakash, MANI, Rekha, KRISHNAN, Aravind, JAIN, Ashish, SAMPAT, Nirav, LAKSHMINARAYANAN, Kameshkumar, MAHAPATRA, Suvendu Kumar
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC COMMUNICATION TECHNIQUE ELECTRIC DIGITAL DATA PROCESSING ELECTRICITY HANDLING RECORD CARRIERS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.