Automatic author name disambiguation by differentiable feature selection

Author name disambiguation (AND) is the task of resolving the ambiguity problem in bibliographic databases, where distinct real-world authors may share the same name or same author may have distinct names. The aim of AND is to split the name-ambiguous entities (articles) into the corresponding autho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of information science 2023-09
Hauptverfasser: Fang, ZhiJian, Zhuo, Yue, Xu, Jinying, Tang, Zhechong, Jia, Zijie, Zhang, HuaXiong
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Author name disambiguation (AND) is the task of resolving the ambiguity problem in bibliographic databases, where distinct real-world authors may share the same name or same author may have distinct names. The aim of AND is to split the name-ambiguous entities (articles) into the corresponding authors. Existing AND algorithms mainly focus on designing different similarity metrics between two ambiguous articles. However, most previous methods empirically select and process the features of entities, then use features to predict the similarity by data-driven models. In this article, we are motivated by natural questions: Which features are most useful for splitting name-ambiguous entities? Can they be automatically determined by an optimisation approach rather than heuristic feature engineering? Therefore, we proposed a novel end-to-end differentiable feature selection algorithm, automatically searching the optimal features for AND task (AAND). AAND optimises the discrete feature selection by differentiable Gumbel-Softmax, leading to the joint learning of feature selection policy and similarity prediction model. The experiments are conducted on a benchmark data set, S2AND, which harmonises eight different AND data sets. The results show that the performance of our proposal is superior to the advanced AND methods and feature selection algorithms. Meanwhile, deep insights into AND features are also given.
ISSN:0165-5515
1741-6485
DOI:10.1177/01655515231193859