ENCODER USING MACHINE-TRAINED TERM FREQUENCY WEIGHTING FACTORS THAT PRODUCES A DENSE EMBEDDING VECTOR

A computer-implemented technique generates a dense embedding vector that provides a distributed representation of input text. The technique includes: generating an input term-frequency (TF) vector of dimension g that includes frequency information relating to frequency of occurrence of terms in an i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ULABALA, Surendra, WANG, Yan, WU, Ye, SACHETI, Arun, THAKKAR, Vishal, HU, Houdong
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer-implemented technique generates a dense embedding vector that provides a distributed representation of input text. The technique includes: generating an input term-frequency (TF) vector of dimension g that includes frequency information relating to frequency of occurrence of terms in an instance of input text; using a TF-modifying component to modify the term-specific frequency information in the input TF vector by respective machine-trained weighting factors, to produce an intermediate vector of dimension g; using a projection component to project the intermediate vector of dimension g into an embedding vector of dimension k, where k is less than g. Both the TF-modifying component and the projection component use respective machine-trained neural networks. An application performs any of a retrieval-based function, a recognition-based function, a recommendation-based function, a classification-based function, etc. based on the embedding vector.