Combining Deep Learning and Reasoning for Address Detection in Unstructured Text Documents
Extracting information from unstructured text documents is a demanding task, since these documents can have a broad variety of different layouts and a non-trivial reading order, like it is the case for multi-column documents or nested tables. Additionally, many business documents are received in pap...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Extracting information from unstructured text documents is a demanding task,
since these documents can have a broad variety of different layouts and a
non-trivial reading order, like it is the case for multi-column documents or
nested tables. Additionally, many business documents are received in paper
form, meaning that the textual contents need to be digitized before further
analysis. Nonetheless, automatic detection and capturing of crucial document
information like the sender address would boost many companies' processing
efficiency. In this work we propose a hybrid approach that combines deep
learning with reasoning for finding and extracting addresses from unstructured
text documents. We use a visual deep learning model to detect the boundaries of
possible address regions on the scanned document images and validate these
results by analyzing the containing text using domain knowledge represented as
a rule based system. |
---|---|
DOI: | 10.48550/arxiv.2202.03103 |