SYSTEMATIC MASS NORMALIZATION OF INTERNATIONAL TITLES

A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GIVERTS, Viadimir, FAN, Xiao, AU, Michael, NAMJOSHI, Parag, Avinash, GATELEY, Kristy, BOOB, Pavan
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title in the second language; 3) translate the sanitized title into a translated title in the first language; 4) break the translated title into n-grams; 5) determine labels for the n-grams using the first database; and 6) determine label to associate with the title.