UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining
Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse. Prior work on table recognition (TR) has mainly centered around complex task-specific combinations of available inputs and tools. We present UniTable, a training...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tables convey factual and quantitative data with implicit conventions created
by humans that are often challenging for machines to parse. Prior work on table
recognition (TR) has mainly centered around complex task-specific combinations
of available inputs and tools. We present UniTable, a training framework that
unifies both the training paradigm and training objective of TR. Its training
paradigm combines the simplicity of purely pixel-level inputs with the
effectiveness and scalability empowered by self-supervised pretraining from
diverse unannotated tabular images. Our framework unifies the training
objectives of all three TR tasks - extracting table structure, cell content,
and cell bounding box - into a unified task-agnostic training objective:
language modeling. Extensive quantitative and qualitative analyses highlight
UniTable's state-of-the-art (SOTA) performance on four of the largest TR
datasets. UniTable's table parsing capability has surpassed both existing TR
methods and general large vision-language models, e.g., GPT-4o, GPT-4-turbo
with vision, and LLaVA. Our code is publicly available at
https://github.com/poloclub/unitable, featuring a Jupyter Notebook that
includes the complete inference pipeline, fine-tuned across multiple TR
datasets, supporting all three TR tasks. |
---|---|
DOI: | 10.48550/arxiv.2403.04822 |