MULTI-LANGUAGE DOCUMENT CLUSTERING

A technique can include identifying a collection of documents to be clustered. The collection of documents can include foreign language documents and base language documents. The foreign language documents can be translated into the base language at a base language translation module. Keywords in th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: BURYAK, KIRILL
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A technique can include identifying a collection of documents to be clustered. The collection of documents can include foreign language documents and base language documents. The foreign language documents can be translated into the base language at a base language translation module. Keywords in the base language documents and keywords in the translated foreign language documents can be determined at a document indexing module. The base language documents can be clustered with the foreign language documents in a common set of document clusters based on the determined keywords in the base language documents and the determined keywords in the translated foreign language documents. In response to a search query in a first language, a listing of search results can be provided that includes documents in the first language and another language from the a common document cluster.