Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this pape...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019, Vol.7, p.16291-16308
Hauptverfasser:	Pawar, Atish, Mago, Vijay
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Corpus Datasets Domains lexical database Natural language processing Neural networks Rivers semantic analysis Semantics sentence similarity Sentences Similarity Unsupervised learning word similarity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word ( r=0.8753 ) and sentence similarity ( r=0.8793 ) concerning Rubenstein and Goodenough standard and the SICK dataset ( r=0.8324 1 ) outperforming other unsupervised models. 1 Eliminating the outliers which constitutes to 3.75% of 4927 statement pairs
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2891692