Text representation method and apparatus

A text-like data representation technique and a text-like data representation apparatus are disclosed that may: acquire image data from a scanned image; segment text regions from the image data; further extract each connected component in the text regions; form clusters based on the connected compon...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fan, Zhigang, Tse, Francis K
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A text-like data representation technique and a text-like data representation apparatus are disclosed that may: acquire image data from a scanned image; segment text regions from the image data; further extract each connected component in the text regions; form clusters based on the connected components; group each connected component in the text regions into one of the clusters with similar or identical characters; generate a high-resolution representative for each cluster; generate a vector representation for each high-resolution representative; and code the text as text data by associating each connected component with its vectorized high-resolution representative, and location in the document.