Method for cutting PDF files related to back-door listing topics

The invention discloses a method for cutting PDF files related to back-door listing topics. The method comprises the steps of 1, obtaining public business files stored in a PDF format by means of a distributed internet crawler technology; 2, determining language description features, keywords and ke...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHOU SHUAIPENG, JING SHUJUAN, ZHANG BEIBEI, XU XIAOYAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a method for cutting PDF files related to back-door listing topics. The method comprises the steps of 1, obtaining public business files stored in a PDF format by means of a distributed internet crawler technology; 2, determining language description features, keywords and keyword headlines of the PDF files related to the back-door listing topics; 3, determining a PDF filepage number information set P, wherein the files contain the keywords and the keyword headlines; 4, adopting a page number exception removal mechanism to remove exceptional page numbers in the PDF file page number information set P obtained in the step 3, and obtaining a PDF file page number information set Pfinal after removal; 5, conducting cutting about the back-door listing topics on the source PDF files according to the PDF file page number information set Pfinal obtained in the step 4 after removal, and finishing cutting the PDF files related to the back-door listing topics. By means ofthe method, the PDF files