METHOD FOR SYSTEMATIC MASS NORMALIZATION OF TITLES
A method for normalizing raw titles to canonical titles is described. The method includes designating a set of canonical titles, generating a set of n-grams for each canonical title, assigning a set of attributes to each n-gram, assigning a set of labels to each of the attributes, and storing the la...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for normalizing raw titles to canonical titles is described. The method includes designating a set of canonical titles, generating a set of n-grams for each canonical title, assigning a set of attributes to each n-gram, assigning a set of labels to each of the attributes, and storing the labeled canonical title and labeled n-grams in a database. In some examples, a new title may be mapped to an existing canonical title in the database by generating a set of n-grams for the new title, looking up the n-grams in the database of canonical titles, retrieving the set of labels assigned to n-grams in the database that match n-grams from the new title, and assigning those labels to the corresponding attributes of the new title. The new title may then be mapped to a canonical title on the basis of similarly labeled attributes. |
---|