Effects of Similarity Score Functions in Attention Mechanisms on the Performance of Neural Question Answering Systems

Attention mechanisms have been incorporated into many neural network-based natural language processing (NLP) models. They enhance the ability of these models to learn and reason with long input texts. A critical part of such mechanisms is the computation of attention similarity scores between two el...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural processing letters 2022-06, Vol.54 (3), p.2283-2302
Hauptverfasser:	Shen, Yuanyuan, Lai, Edmund M.-K., Mohaghegh, Mahsa
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Complex Systems Computational Intelligence Computer Science Language Machine translation Natural language processing Neural networks Questions Recurrent neural networks Similarity Similarity measures Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Attention mechanisms have been incorporated into many neural network-based natural language processing (NLP) models. They enhance the ability of these models to learn and reason with long input texts. A critical part of such mechanisms is the computation of attention similarity scores between two elements of the texts using a similarity score function. Given that these models have different architectures, it is difficult to comparatively evaluate the effectiveness of different similarity score functions. In this paper, we proposed a baseline model that captures the common components of recurrent neural network-based Question Answering (QA) systems found in the literature. By isolating the attention function, this baseline model allows us to study the effects of different similarity score functions on the performance of such systems. Experimental results show that a trilinear function produced the best results among the commonly used functions. Based on these insights, a new T-trilinear similarity function is proposed which achieved the higher predictive EM and F1 scores than these existing functions. A heatmap visualization of the attention score matrix explains why this T-trilinear function is effective.
ISSN:	1370-4621 1573-773X
DOI:	10.1007/s11063-021-10730-4