Changing the Geometry of Representations: α -Embeddings for NLP Tasks

Word embeddings based on a conditional model are commonly used in Natural Language Processing (NLP) tasks to embed the words of a dictionary in a low dimensional linear space. Their computation is based on the maximization of the likelihood of a conditional probability distribution for each word of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Entropy (Basel, Switzerland) Switzerland), 2021-02, Vol.23 (3), p.287
Hauptverfasser:	Volpi, Riccardo, Thakur, Uddhipan, Malagò, Luigi
Format:	Artikel
Sprache:	eng
Schlagworte:	attention mechanism information geometry word embeddings α-embeddings
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Word embeddings based on a conditional model are commonly used in Natural Language Processing (NLP) tasks to embed the words of a dictionary in a low dimensional linear space. Their computation is based on the maximization of the likelihood of a conditional probability distribution for each word of the dictionary. These distributions form a Riemannian statistical manifold, where word embeddings can be interpreted as vectors in the tangent space of a specific reference measure on the manifold. A novel family of word embeddings, called α-embeddings have been recently introduced as deriving from the geometrical deformation of the simplex of probabilities through a parameter α, using notions from Information Geometry. After introducing the α-embeddings, we show how the deformation of the simplex, controlled by α, provides an extra handle to increase the performances of several intrinsic and extrinsic tasks in NLP. We test the α-embeddings on different tasks with models of increasing complexity, showing that the advantages associated with the use of α-embeddings are present also for models with a large number of parameters. Finally, we show that tuning α allows for higher performances compared to the use of larger models in which additionally a transformation of the embeddings is learned during training, as experimentally verified in attention models.
ISSN:	1099-4300 1099-4300
DOI:	10.3390/e23030287