Extensible document content structuring method and system based on computer vision

The invention relates to an extensible document content structuring method and system based on computer vision, and the method comprises the following steps: defining a universal hierarchical document structure, universal document elements and special document elements, and constructing a universal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LI DI, SUN JIANZHONG, LI LIHUA, QU JIABO, MENG ZHAOHAI, QIU JUAN, ZHANG GUIFA
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to an extensible document content structuring method and system based on computer vision, and the method comprises the following steps: defining a universal hierarchical document structure, universal document elements and special document elements, and constructing a universal document structure recognition model; acquiring an image sequence of the training group document, pre-labeling the image sequence of the training group document, constructing a special document structure recognition data set, and performing data training on the universal document structure recognition model to form a special document recognition model; obtaining a structured image sequence; and identifying the to-be-structured image sequence by using the special document identification model, converting the to-be-structured image sequence into structured document information, and outputting the structured document information. The document content structuring method provided by the invention has expansibility, can