Identification of layout and content flow of an unstructured document

Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs, each of which has a position in the unstructured document. Based on positions of the glyphs in the unstructured document, the method creates associations between different sets of glyphs in ord...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: MANSFIELD PHILIP ANDREW, LEVY MICHAEL ROBERT
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs, each of which has a position in the unstructured document. Based on positions of the glyphs in the unstructured document, the method creates associations between different sets of glyphs in order to identify different sets of glyphs as different words. The method creates associations between different sets of words in order to identify different sets of words as different paragraphs. The method defines associations between paragraphs that are not contiguous in order to define a reading order for the paragraphs.