Document Intelligence Metrics for Visually Rich Document Evaluation

The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence. We introduce DI-Metrics, a Python library devoted to VRD model evaluation comprising text-based, geometric-based and hierarchical metrics for information extrac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-05
Hauptverfasser:	DeGange, Jonathan, Gupta, Swapnil, Han, Zhuoyu, Wilkosz, Krzysztof, Karwan, Adam
Format:	Artikel
Sprache:	eng
Schlagworte:	Documents Information retrieval Intelligence Libraries Performance evaluation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence. We introduce DI-Metrics, a Python library devoted to VRD model evaluation comprising text-based, geometric-based and hierarchical metrics for information extraction tasks. We apply DI-Metrics to evaluate information extraction performance using publicly available CORD dataset, comparing performance of three SOTA models and one industry model. The open-source library is available on GitHub.
ISSN:	2331-8422