IMAGE READING SYSTEMS, METHODS AND STORAGE MEDIUM FOR PERFORMING GEOMETRIC EXTRACTION

Geometric extraction is performed on an unstructured document by recognizing textual blocks on at least a portion of a page of the unstructured document, generating bounding boxes that surround and correspond to the textual blocks, determining search paths having coordinates of two endpoints and con...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Stuhlsatz, Bryan Lee, Tomar, Ankur, Kolavennu, Soumitri Naga, Sriram, Varshini, Roque, Rodolfo Carriedo, Basavaraju, Lavanya
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Geometric extraction is performed on an unstructured document by recognizing textual blocks on at least a portion of a page of the unstructured document, generating bounding boxes that surround and correspond to the textual blocks, determining search paths having coordinates of two endpoints and connecting at least two bounding boxes, and generating a graph representation of the at least a portion of the page, the graph representation including the plurality of textual blocks, the coordinates of the vertices of each bounding box and the coordinates of the two endpoints of each search path.