SYSTEMATIC MASS NORMALIZATION OF INTERNATIONAL TITLES

A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Giverts Vladimir, Namjoshi Parag Avinash, Boob Pavan, Gateley Kristy, Fan Xiao, Au Michael
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title in the second language; 3) translate the sanitized title into a translated title in the first language; 4) break the translated title into n-grams; 5) determine labels for the n-grams using the first database; and 6) determine label to associate with the title.