Neologism classification techniques with trigrams and longest common subsequences

Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and en...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Malak, Michael, Rivas, Luis E, Kreider, Mark Lee
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Techniques are provided for identifying attributes associated with a neologism or an unknown word or name. Real world characteristics can be predicted for the neologism. Trigrams are identified for an input word and word embedding model vector values are calculated for the identified trigrams and entered into a matrix. Trigrams are identified for nearest names. Classification values are calculated based on the trigrams for the input word and the trigrams from the nearest names and the classification values are entered into the matrix. A convolutional neural network can process the matrix to identify one or more characteristics associated with the neologism.