PDF table structure identification method based on graph attention mechanism
The invention relates to a PDF table structure recognition method based on a graph attention mechanism, and belongs to the technical field of document analysis in a data mining technology. The methodcomprises the following steps of 1, preprocessing, wherein all cells in a table and position coordina...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a PDF table structure recognition method based on a graph attention mechanism, and belongs to the technical field of document analysis in a data mining technology. The methodcomprises the following steps of 1, preprocessing, wherein all cells in a table and position coordinates of the cells are obtained; 2, graph construction: establishing an undirected graph for the obtained cells; and 3, relationship prediction: classifying the edges on the constructed undirected graph, and predicting the adjacency relationship between the cells by using a neural network model. Compared with the prior art, the method for identifying the complex table structure in the PDF is proposed for the first time, the best effect is achieved on two table structure identification data sets,and particularly, the effect is obviously improved on complex table structure identification.
本发明涉及一种基于图注意力机制的PDF表格结构识别方法,属于数据挖掘技术中的文档分析技术领域;包括以下步骤:一、预处理:获取表格中的所有单元格以及它们的位置坐标;二、图构建:对得到的单元格建立无向图;三、关系预测:通过对构建的无向图上的边进行分类,使用神经网络模型 |
---|