Complicated Table Structure Recognition
The task of table structure recognition aims to recognize the internal structure of a table, which is a key step to make machines understand tables. Currently, there are lots of studies on this task for different file formats such as ASCII text and HTML. It also attracts lots of attention to recogni...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The task of table structure recognition aims to recognize the internal
structure of a table, which is a key step to make machines understand tables.
Currently, there are lots of studies on this task for different file formats
such as ASCII text and HTML. It also attracts lots of attention to recognize
the table structures in PDF files. However, it is hard for the existing methods
to accurately recognize the structure of complicated tables in PDF files. The
complicated tables contain spanning cells which occupy at least two columns or
rows. To address the issue, we propose a novel graph neural network for
recognizing the table structure in PDF files, named GraphTSR. Specifically, it
takes table cells as input, and then recognizes the table structures by
predicting relations among cells. Moreover, to evaluate the task better, we
construct a large-scale table structure recognition dataset from scientific
papers, named SciTSR, which contains 15,000 tables from PDF files and their
corresponding structure labels. Extensive experiments demonstrate that our
proposed model is highly effective for complicated tables and outperforms
state-of-the-art baselines over a benchmark dataset and our new constructed
dataset. |
---|---|
DOI: | 10.48550/arxiv.1908.04729 |