PDF (Portable Document Format) document analysis method, analysis device, equipment and medium
The invention provides a PDF document analysis method and device, equipment and a medium, and belongs to the technical field of data processing. The method comprises the steps of performing information extraction on a to-be-analyzed PDF document to obtain a plurality of document characters contained...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a PDF document analysis method and device, equipment and a medium, and belongs to the technical field of data processing. The method comprises the steps of performing information extraction on a to-be-analyzed PDF document to obtain a plurality of document characters contained in the PDF document and character position information of each document character; carrying out layout identification on the PDF document, and determining at least two layout areas, the layout category of each layout area and the area position information of each layout area contained in the layout identification result of the PDF document; for any layout area, determining a target document character corresponding to any layout area from the plurality of document characters according to the area position information of any layout area and the character position information of the document characters; and according to a filling strategy corresponding to the layout category of any layout area, filling the target doc |
---|