Model matching in intelligent document understanding

Intelligent Document Understanding (IDU) is the process of converting scanned document pages into an electronic, processable form. We have previously presented a IDU system architecture suitable for this task which uses a hybrid bottom-up/top-down control strategy. In this paper we focus on a specif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Farrow, G.S.D., Xydeas, C.S., Oakley, J.P.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Intelligent Document Understanding (IDU) is the process of converting scanned document pages into an electronic, processable form. We have previously presented a IDU system architecture suitable for this task which uses a hybrid bottom-up/top-down control strategy. In this paper we focus on a specific subproblem that arises within the chosen framework, concerned with selecting an appropriate page layout structure. A detailed analysis of the problem using an error propagation model, allows computationally simple search strategies to be developed. A multistage layout formation algorithm is proposed and its performance is critically assessed when implemented using two different Layout Object selection criterion. The first selection criterion is based on a maximal column area coverage; the second is based on a probabilistic Layout Object selection. Both techniques have been incorporated into the hybrid IDU system and the results presented indicate its superiority over previously reported systems.
DOI:10.1109/ICDAR.1995.598997