A survey of semantic relatedness evaluation datasets and procedures

Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one works best is still a challenging task. Thus, in this article, we give a comprehensive overview of the evaluation protocols and datasets for semanti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Artificial intelligence review 2020-08, Vol.53 (6), p.4407-4448
Hauptverfasser: Hadj Taieb, Mohamed Ali, Zesch, Torsten, Ben Aouicha, Mohamed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one works best is still a challenging task. Thus, in this article, we give a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsic approaches. One the intrinsic side, we give an overview of evaluation datasets covering more than 100 datasets in 20 different languages from a wide range of domains. To provide researchers with better guidance for selecting suitable dataset or even building new and better ones, we describe also the construction and annotation process of the datasets. We also shortly describe the evaluation metrics most frequently used for intrinsic evaluation. As for the extrinsic side, several applications involving semantic relatedness measures are detailed through recent research works and by explaining the benefit brought by the measures.
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-019-09796-3