STRUCTURING UNSTRUCTURED DATA VIA OPTICAL CHARACTER RECOGNITION AND ANALYSIS

The present disclosure describes devices and methods of providing a technology environment for analyzing unstructured data to generate structured data. A set of electronic documents, each electronic document associated with a type of product, may be accessed. A data instance for each of the document...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kumar, Manoranjan, Henryson, Stacy R, Miller, Bradley, Wilson, Matthew D, Penfil, II, Richard, Jog, Nikhil
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present disclosure describes devices and methods of providing a technology environment for analyzing unstructured data to generate structured data. A set of electronic documents, each electronic document associated with a type of product, may be accessed. A data instance for each of the documents may be generated. The data instance may include a plurality of data fields that are based on the type of product. The electronic documents may be analyzed to identify values for each of the plurality of data fields. Analyzing the electronic documents may comprise applying a respective character recognition algorithm to respective electronic documents, and assigning a confidence factor to each of the values. The data instances comprising the values for each of the plurality of data fields may be stored in a second database.