CHINESE, JAPANESE, OR KOREAN LANGUAGE DETECTION
Disclosed are systems, computer-readable mediums, and methods for determining a text contains Chinese, Japanese, or Korean characters. A document image is received and binarized. The binarized document image is searched for connected components. A plurality of fragments is identified based on the co...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Disclosed are systems, computer-readable mediums, and methods for determining a text contains Chinese, Japanese, or Korean characters. A document image is received and binarized. The binarized document image is searched for connected components. A plurality of fragments is identified based on the connected components. A language hypothesis for each fragment of the plurality of fragments is determined. The language hypothesis has a probability rating. A subset of fragments from the plurality of fragments having the highest probability ratings is selected. The language hypothesis of each fragment in the subset of fragments is verified. A determination of the presence of Chinese, Japanese, or Korean characters is made based at least on the verification of the language hypothesis of the subset of fragments. |
---|