LORE++: Logical location regression network for table structure recognition with pre-training

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Current approaches address this issue by either predicting the adjacency of detected cells or direct generation of structural sequences. Nonetheless, these approaches either count on additional...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2025-01, Vol.157, p.110816, Article 110816
Hauptverfasser: Long, Rujiao, Xing, Hangdi, Yang, Zhibo, Zheng, Qi, Yu, Zhi, Huang, Fei, Yao, Cong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Current approaches address this issue by either predicting the adjacency of detected cells or direct generation of structural sequences. Nonetheless, these approaches either count on additional heuristic rules for post-processing, or involve the generation of extremely long-range sequences that lead to computational intricacy. In this paper, We redefine TSR as a LOgical location REgression paradigm, which effectively captures inherent logical dependencies and constraints among table cells. Correspondingly, we propose LORE, a novel approach for TSR. LORE simultaneously predicts accurate geometric coordinates of table cells and the logical structures of the entire table. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other TSR paradigms. Moreover, to enhance the model’s spatial and logical representation capabilities, we propose two pre-training tasks, resulting in an upgraded version named LORE++. The incorporation of pre-training is proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR. •A novel logical location regression paradigm for visual table structure recognition.•Simultaneous predictions of spatial and logical locations of the table cells.•Comprehensive analysis for the proposed logical location regression.•Two pre-training strategies for table structures, further boosting the performance.•Significant improvement on various benchmarks.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.110816