Language identification
A plurality of documents in each of a plurality of languages can be received. A Latent Semantic Indexing (LSI) index can be created from the plurality of documents. A language classification model can be trained from the LSI index. A document to be identified by language can be received. A vector in...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A plurality of documents in each of a plurality of languages can be received. A Latent Semantic Indexing (LSI) index can be created from the plurality of documents. A language classification model can be trained from the LSI index. A document to be identified by language can be received. A vector in the LSI index can be generated for the document to be identified by language. The vector can be evaluated against the language classification model. |
---|