Synonym recognition from short texts: A self-supervised learning approach

Synonyms refer to different expressions for the same entity in the text and affect entity-centric text mining research performance. Therefore, synonym recognition has become a promising research topic in recent years. However, most existing approaches are based on structured, semi-structured, or lon...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2023-08, Vol.224, p.119966, Article 119966
Hauptverfasser:	Mu, Lin, Jin, Peiquan, Zhang, Yiwen, Zhong, Hong, Zhao, Jie
Format:	Artikel
Sprache:	eng
Schlagworte:	Clustering Self-supervised Short-text Synonyms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Synonyms refer to different expressions for the same entity in the text and affect entity-centric text mining research performance. Therefore, synonym recognition has become a promising research topic in recent years. However, most existing approaches are based on structured, semi-structured, or long text, and only a few studies have tackled synonym recognition in short texts on social networks. Synonyms recognition in short texts confronts several research challenges. First, there are a large number of unlabeled synonyms in the short texts. Second, many new words will appear in short text on social networks. Therefore, in this paper, we propose a self-supervised learning method to recognize synonyms in short texts, which consists of two steps. First, we use a clustering algorithm to generate a pseudo-label for expression. Second, we input the co-occurrence information and the character information of the expressions into a deep-learning model to obtain the feature representation of the expression. The two steps are executed iteratively until the algorithm converges. To demonstrate the effectiveness of the proposed method, we conducted extensive experiments on a real short-text dataset, and the results suggest the effectiveness of our proposal. •We propose a self-supervised learning method to recognize synonym in short texts.•We propose to use two types of information to extract the features of expressions.•We report experimental results on a real microblog dataset.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.119966