PDF (Portable Document Format) document analysis method, analysis device, equipment and medium

The invention provides a PDF document analysis method and device, equipment and a medium, and belongs to the technical field of data processing. The method comprises the steps of performing information extraction on a to-be-analyzed PDF document to obtain a plurality of document characters contained...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HUO JINGCHAO, YUAN JIE
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a PDF document analysis method and device, equipment and a medium, and belongs to the technical field of data processing. The method comprises the steps of performing information extraction on a to-be-analyzed PDF document to obtain a plurality of document characters contained in the PDF document and character position information of each document character; carrying out layout identification on the PDF document, and determining at least two layout areas, the layout category of each layout area and the area position information of each layout area contained in the layout identification result of the PDF document; for any layout area, determining a target document character corresponding to any layout area from the plurality of document characters according to the area position information of any layout area and the character position information of the document characters; and according to a filling strategy corresponding to the layout category of any layout area, filling the target doc