Predicting coauthorship using bibliographic network embedding

Coauthorship prediction applies predictive analytics to bibliographic data to predict authors who are highly likely to be coauthors. In this study, we propose an approach for coauthorship prediction based on bibliographic network embedding through a graph‐based bibliographic data model that can be u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Society for Information Science and Technology 2023-04, Vol.74 (4), p.388-401
Hauptverfasser:	Zhu, Yongjun, Quan, Lihong, Chen, Pei‐Ying, Kim, Meen Chul, Che, Chao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bibliographic literature Bibliographic records Bibliographies Co authorship Data models Embedding Machine learning Predictions Translation Translation methods and strategies Writers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Coauthorship prediction applies predictive analytics to bibliographic data to predict authors who are highly likely to be coauthors. In this study, we propose an approach for coauthorship prediction based on bibliographic network embedding through a graph‐based bibliographic data model that can be used to model common bibliographic data, including papers, terms, sources, authors, departments, research interests, universities, and countries. A real‐world dataset released by AMiner that includes more than 2 million papers, 8 million citations, and 1.7 million authors were integrated into a large bibliographic network using the proposed bibliographic data model. Translation‐based methods were applied to the entities and relationships to generate their low‐dimensional embeddings while preserving their connectivity information in the original bibliographic network. We applied machine learning algorithms to embeddings that represent the coauthorship relationships of the two authors and achieved high prediction results. The reference model, which is the combination of a network embedding size of 100, the most basic translation‐based method, and a gradient boosting method achieved an F1 score of 0.9 and even higher scores are obtainable with different embedding sizes and more advanced embedding methods. Thus, the strengths of the proposed approach lie in its customizable components under a unified framework.
ISSN:	2330-1635 2330-1643
DOI:	10.1002/asi.24711