SYSTEMATIC MASS NORMALIZATION OF INTERNATIONAL TITLES
A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title i...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A system for generating a database of labeled foreign canonical titles includes an interface and a processor. The interface is to receive a title in a second language. The processor is to 1) store a set of n-grams in a first language in a first database; 2) sanitize the title into a sanitize title in the second language; 3) translate the sanitized title into a translated title in the first language; 4) break the translated title into n-grams; 5) determine labels for the n-grams using the first database; and 6) determine label to associate with the title. |
---|