LORE++: Logical location regression network for table structure recognition with pre-training
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Current approaches address this issue by either predicting the adjacency of detected cells or direct generation of structural sequences. Nonetheless, these approaches either count on additional...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2025-01, Vol.157, p.110816, Article 110816 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Current approaches address this issue by either predicting the adjacency of detected cells or direct generation of structural sequences. Nonetheless, these approaches either count on additional heuristic rules for post-processing, or involve the generation of extremely long-range sequences that lead to computational intricacy. In this paper, We redefine TSR as a LOgical location REgression paradigm, which effectively captures inherent logical dependencies and constraints among table cells. Correspondingly, we propose LORE, a novel approach for TSR. LORE simultaneously predicts accurate geometric coordinates of table cells and the logical structures of the entire table. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other TSR paradigms. Moreover, to enhance the model’s spatial and logical representation capabilities, we propose two pre-training tasks, resulting in an upgraded version named LORE++. The incorporation of pre-training is proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR.
•A novel logical location regression paradigm for visual table structure recognition.•Simultaneous predictions of spatial and logical locations of the table cells.•Comprehensive analysis for the proposed logical location regression.•Two pre-training strategies for table structures, further boosting the performance.•Significant improvement on various benchmarks. |
---|---|
ISSN: | 0031-3203 |
DOI: | 10.1016/j.patcog.2024.110816 |