Wikifying software artifacts

Context The computational linguistics community has developed tools, called wikifiers, to identify links to Wikipedia articles from free-form text. Software engineering research can leverage wikifiers to add semantic information to software artifacts. However, no empirically-grounded basis exists to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Empirical software engineering : an international journal 2021-03, Vol.26 (2), Article 31
Hauptverfasser: Nassif, Mathieu, Robillard, Martin P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Context The computational linguistics community has developed tools, called wikifiers, to identify links to Wikipedia articles from free-form text. Software engineering research can leverage wikifiers to add semantic information to software artifacts. However, no empirically-grounded basis exists to choose an effective wikifier and to configure it for the software domain, on which wikifiers were not specifically trained. Objective We conducted a study to guide the selection of a wikifier and its configuration for applications in the software domain, and to measure what performance can be expected of wikifiers. Method We applied six wikifiers, with multiple configurations, to a sample of 500 Stack Overflow posts. We manually annotated the 41 124 articles identified by the wikifiers as correct or not to compare their precision and recall. Results Each wikifier, in turn, achieved the highest precision, between 13% and 82%, for different thresholds of recall, from 60% to 5%. However, filtering the wikifiers’ output with a whitelist can considerably improve the precision above 79% for recall up to 30%, and above 47% for recall up to 60%. Conclusions Results reported in each wikifier’s original article cannot be generalized to software-specific documents. Given that no wikifier performs universally better than all others, we provide empirically grounded insights to select a wikifier for different scenarios, and suggest ways to further improve their performance for the software domain.
ISSN:1382-3256
1573-7616
DOI:10.1007/s10664-020-09918-4