Language identification for printed text independent of segmentation

This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wood, S.L., Xiaozhong Yao, Krishnamurthi, K., Dang, L.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which attempt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using the less computationally intensive methods described.
DOI:10.1109/ICIP.1995.537663