Self-supervised phrase embedding method by fusing internal and external semantic information of phrases

The quality of the phrase embedding is related to the performance of many NLP downstream tasks. Most of the existing phrase embedding methods are difficult to achieve satisfactory performance, or the robustness is ignored in pursuit of performance. In response to these problems, this paper proposes...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2023-05, Vol.82 (13), p.20477-20495
Hauptverfasser:	Li, Rongsheng, Wei, Chi, Huang, Shaobin, Yan, Naiyu
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Computer Communication Networks Computer Science Correlation coefficients Data Structures and Information Theory Datasets Embedding Mathematical analysis Multimedia Information Systems Robustness Similarity Special Purpose and Application-Based Systems Track 2: Medical Applications of Multimedia Trigonometric functions
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The quality of the phrase embedding is related to the performance of many NLP downstream tasks. Most of the existing phrase embedding methods are difficult to achieve satisfactory performance, or the robustness is ignored in pursuit of performance. In response to these problems, this paper proposes an effective phrase embedding method called Multi-loss Optimized Self-supervised Phrase Embedding (MOSPE). This method inputs pre-trained phrase embedding and component word embedding into an encoder composed of LSTM, a fully connected network, and an attention mechanism to obtain a embedding vector. Subsequently, the entire network is trained by the embedding vector to the original input through multiple loss functions. LSTM can capture the sequence information of component words. The attention mechanism can capture the importance of different component words. The fully connected network can effectively integrate the above information. Different loss functions are called weighted mean square error loss functions. They use the cosine similarity to calculate the correlation between the component word embedding and the distributed embedding of the phrase to measure the component word’s importance weight. They can also measure the ratio of the phrase’s internal and external information through the elements sum of the phrase constituent words and the cosine similarity of the phrase embeddings. This method does not need the supervision data and can get well-represented phrase embeddings. We use four evaluation methods to conduct experiments on three widely used phrase embedding evaluation datasets. The experimental results show that the Spearman correlation coefficient of the method on the English phrase similarity dataset reaches 0.686, the Chinese phrase similarity dataset reaches 0.846, and the F1 value on the phrase classification dataset reaches 0.715. Overall, it outperforms strong baseline methods with good robustness.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-14312-x