File content identification method and equipment
The invention provides a file content identification method and equipment, and the equipment comprises a configuration module which is used for defining a file and attributes of collection items in the file, obtaining the positions of the collection items in the file, and configuring identification...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a file content identification method and equipment, and the equipment comprises a configuration module which is used for defining a file and attributes of collection items in the file, obtaining the positions of the collection items in the file, and configuring identification rules of the collection items; the task module is used for creating a recognition task, preliminarily recognizing the content of the file, and dividing the recognition task of the file content into a character recognition task and an OCR recognition task according to a preliminary recognition result; an acquisition module; and collecting the content of the collection item in the file according to the identification task, and identifying the text in the content of the collection item according to the rule defined in the configuration module. According to the method, character recognition or OCR picture recognition can be automatically recognized, flexible configurable processing of collection of collection items is |
---|