Estimation of document structure

A system and method for estimating document structure of a document which includes extracting one or more candidate elements describing the document structure from the document and grouping the one or more candidate elements into a group and building one or more trees for the group. Each tree has a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Hatsutori, Yoichi
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A system and method for estimating document structure of a document which includes extracting one or more candidate elements describing the document structure from the document and grouping the one or more candidate elements into a group and building one or more trees for the group. Each tree has a root node and a leaf node selected from the candidate elements in the group. The method further includes pruning the one or more trees while leaving a path from the root node to the leaf node, based on whether a text corresponding to the path to the leaf node is accommodated in a single group of words.