DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction
We address the challenge of extracting structured information from business documents without detailed annotations. We propose Deep Conditional Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional complex documents and use Recursive Neural Networks to create an end-to-end system...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We address the challenge of extracting structured information from business
documents without detailed annotations. We propose Deep Conditional
Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional
complex documents and use Recursive Neural Networks to create an end-to-end
system for finding the most probable parse that represents the structured
information to be extracted. This system is trained end-to-end with scanned
documents as input and only relational-records as labels. The
relational-records are extracted from existing databases avoiding the cost of
annotating documents by hand. We apply this approach to extract information
from scanned invoices achieving state-of-the-art results despite using no
hand-annotations. |
---|---|
DOI: | 10.48550/arxiv.2103.05908 |