Language identification

A plurality of documents in each of a plurality of languages can be received. A Latent Semantic Indexing (LSI) index can be created from the plurality of documents. A language classification model can be trained from the LSI index. A document to be identified by language can be received. A vector in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Bittmann Mark
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A plurality of documents in each of a plurality of languages can be received. A Latent Semantic Indexing (LSI) index can be created from the plurality of documents. A language classification model can be trained from the LSI index. A document to be identified by language can be received. A vector in the LSI index can be generated for the document to be identified by language. The vector can be evaluated against the language classification model.