CHINESE, JAPANESE, OR KOREAN LANGUAGE DETECTION

Disclosed are systems, computer-readable mediums, and methods for determining a text contains Chinese, Japanese, or Korean characters. A document image is received and binarized. The binarized document image is searched for connected components. A plurality of fragments is identified based on the co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHULININ YURI GEORGIEVICH, DERYAGIN DMITRY GEORGIEVICH, YURIEVICH ATROSHCHENKO MIKHAIL
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed are systems, computer-readable mediums, and methods for determining a text contains Chinese, Japanese, or Korean characters. A document image is received and binarized. The binarized document image is searched for connected components. A plurality of fragments is identified based on the connected components. A language hypothesis for each fragment of the plurality of fragments is determined. The language hypothesis has a probability rating. A subset of fragments from the plurality of fragments having the highest probability ratings is selected. The language hypothesis of each fragment in the subset of fragments is verified. A determination of the presence of Chinese, Japanese, or Korean characters is made based at least on the verification of the language hypothesis of the subset of fragments.