Extracting information from handwritten content in census forms
In this paper, we describe our approach for extracting salient information from US census form images. These forms present several challenges including variations in individual form templates, skew, writing device, writing style, etc. We describe an innovative registration algorithm that is robust t...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we describe our approach for extracting salient information from US census form images. These forms present several challenges including variations in individual form templates, skew, writing device, writing style, etc. We describe an innovative registration algorithm that is robust to scale variations for segmenting the input image into cells. Following registration, the borders of cells are removed using a shape-based rule-line removal algorithm to extract handwritten content from each cell. Finally, the individual cell images are recognized using a hidden Markov model (HMM) OCR system with language models biased for the type of information in the cell, such as person name, place name, numbers, marital status, gender, race, etc. |
---|---|
ISSN: | 1051-4651 2831-7475 |