CLSEP: Contrastive learning of sentence embedding with prompt

Sentence embedding, which aims to learn an effective representation of the sentence, is beneficial for downstream tasks. By utilizing contrastive learning, most recent sentence embedding methods have achieved promising results. However, these methods adopt simple data augmentation strategies to obta...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2023-04, Vol.266, p.110381, Article 110381
Hauptverfasser:	Wang, Qian, Zhang, Weiqi, Lei, Tianyi, Cao, Yu, Peng, Dezhong, Wang, Xu
Format:	Artikel
Sprache:	eng
Schlagworte:	Contrastive learning Data augmentation Sentence embedding Unsupervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Sentence embedding, which aims to learn an effective representation of the sentence, is beneficial for downstream tasks. By utilizing contrastive learning, most recent sentence embedding methods have achieved promising results. However, these methods adopt simple data augmentation strategies to obtain variants of the sentence, limiting the representation ability of sentence embedding. In addition, these methods simply adopt the original framework of contrastive learning developed for image representation, which is not suitable for learning sentence embedding. To address these issues, we propose a method dubbed unsupervised contrastive learning of sentence embedding with prompt (CLSEP), aiming to provide effective sentence embedding by utilizing the prompt mechanism. Meanwhile, we propose a novel data augmentation strategy for text data named partial word vector augmentation (PWVA), which augments the data in the word embedding space, retaining more semantic information. Finally, we introduce supervised contrastive learning of sentence embedding (SuCLSE) and verify the effectiveness of the PWVA on the natural language inference (NLI) task. Extensive experiments are conducted on the STS dataset, demonstrating that the proposed CLSEP and SuPCSE are superior to the previous best methods, by utilizing the proposed PWVA strategy. The code is available at https://github.com/qianandfei/CLSEP-Contrastive-Learning-of-Sentence-Embedding-with-Prompt.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2023.110381