Synonym recognition from short texts: A self-supervised learning approach
Synonyms refer to different expressions for the same entity in the text and affect entity-centric text mining research performance. Therefore, synonym recognition has become a promising research topic in recent years. However, most existing approaches are based on structured, semi-structured, or lon...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2023-08, Vol.224, p.119966, Article 119966 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Synonyms refer to different expressions for the same entity in the text and affect entity-centric text mining research performance. Therefore, synonym recognition has become a promising research topic in recent years. However, most existing approaches are based on structured, semi-structured, or long text, and only a few studies have tackled synonym recognition in short texts on social networks. Synonyms recognition in short texts confronts several research challenges. First, there are a large number of unlabeled synonyms in the short texts. Second, many new words will appear in short text on social networks. Therefore, in this paper, we propose a self-supervised learning method to recognize synonyms in short texts, which consists of two steps. First, we use a clustering algorithm to generate a pseudo-label for expression. Second, we input the co-occurrence information and the character information of the expressions into a deep-learning model to obtain the feature representation of the expression. The two steps are executed iteratively until the algorithm converges. To demonstrate the effectiveness of the proposed method, we conducted extensive experiments on a real short-text dataset, and the results suggest the effectiveness of our proposal.
•We propose a self-supervised learning method to recognize synonym in short texts.•We propose to use two types of information to extract the features of expressions.•We report experimental results on a real microblog dataset. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.119966 |