Method for cutting PDF files related to back-door listing topics
The invention discloses a method for cutting PDF files related to back-door listing topics. The method comprises the steps of 1, obtaining public business files stored in a PDF format by means of a distributed internet crawler technology; 2, determining language description features, keywords and ke...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a method for cutting PDF files related to back-door listing topics. The method comprises the steps of 1, obtaining public business files stored in a PDF format by means of a distributed internet crawler technology; 2, determining language description features, keywords and keyword headlines of the PDF files related to the back-door listing topics; 3, determining a PDF filepage number information set P, wherein the files contain the keywords and the keyword headlines; 4, adopting a page number exception removal mechanism to remove exceptional page numbers in the PDF file page number information set P obtained in the step 3, and obtaining a PDF file page number information set Pfinal after removal; 5, conducting cutting about the back-door listing topics on the source PDF files according to the PDF file page number information set Pfinal obtained in the step 4 after removal, and finishing cutting the PDF files related to the back-door listing topics. By means ofthe method, the PDF files |
---|