Label and field identification without optical character recognition (OCR)

A method for identifying form fields in a digital image, the method comprising: training a machine-learning model using a collection of training instances each having an assigned classification; receiving, over a network, a digital image of a form taken by a smartphone digital camera; segmenting the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Moise, Daniel L, Ramaswamy, Pallavika, Porcina, Sheldon, Becker, Richard J
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method for identifying form fields in a digital image, the method comprising: training a machine-learning model using a collection of training instances each having an assigned classification; receiving, over a network, a digital image of a form taken by a smartphone digital camera; segmenting the digital image into a plurality of image segments comprising a set of pixels; detecting a plurality of features in a first one of the image segments; detecting a plurality of defects in the first one of the image segments that include one or more defects that would increase the processing time for an optical character recognition process of the first image segment; extracting the plurality of features from a first one of the image segments; determining, using the machine-learning model, the first image segment depicts a field in the form based on the plurality of features; and classifying the field using the machine-leaming model, wherein the machine learning model assigns a classification to the field based on the plurality of features. Training Data 108 Training Image Segments 502 53,657.60) --- --- Training Instances 504 Feature Extractor 2_04 Segment classifier 206. Unclassified Instance 408 Machine-Learning Model Output Classification 508