From paper to office document standard representation

The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer (Long Beach, Calif.) Calif.), 1992-07, Vol.25 (7), p.63-67
Hauptverfasser: Dengel, A., Bleisinger, R., Hoch, R., Fein, F., Hones, F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard.< >
ISSN:0018-9162
1558-0814
DOI:10.1109/2.144442