Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study

In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included pub...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientometrics 2022-10, Vol.127 (10), p.5863-5877
1. Verfasser: Donner, Paul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.
ISSN:0138-9130
1588-2861
DOI:10.1007/s11192-022-04480-w