LMDX: Language Model-based Document Information Extraction and Localization
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large Language Models (LLM) have revolutionized Natural Language Processing
(NLP), improving state-of-the-art and exhibiting emergent capabilities across
various tasks. However, their application in extracting information from
visually rich documents, which is at the core of many document processing
workflows and involving the extraction of key entities from semi-structured
documents, has not yet been successful. The main obstacles to adopting LLMs for
this task include the absence of layout encoding within LLMs, which is critical
for high quality extraction, and the lack of a grounding mechanism to localize
the predicted entities within the document. In this paper, we introduce
Language Model-based Document Information Extraction and Localization (LMDX), a
methodology to reframe the document information extraction task for a LLM. LMDX
enables extraction of singular, repeated, and hierarchical entities, both with
and without training data, while providing grounding guarantees and localizing
the entities within the document. Finally, we apply LMDX to the PaLM 2-S and
Gemini Pro LLMs and evaluate it on VRDU and CORD benchmarks, setting a new
state-of-the-art and showing how LMDX enables the creation of high quality,
data-efficient parsers. |
---|---|
DOI: | 10.48550/arxiv.2309.10952 |