Exploring scientific trajectories of a large-scale dataset using topic-integrated path extraction

•Main path analysis (MPA) is the most widely accepted approach to tracing knowledge transfer in a research field. In this study, we extracted multiple longest paths from the multidisciplinary academic field's citation network and integrating topic modeling to the extracted paths.•We considered...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of informetrics 2022-02, Vol.16 (1), p.101242, Article 101242
Hauptverfasser: Kim, Erin H.J., Jeong, Yoo Kyung, Kim, YongHwan, Song, Min
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Main path analysis (MPA) is the most widely accepted approach to tracing knowledge transfer in a research field. In this study, we extracted multiple longest paths from the multidisciplinary academic field's citation network and integrating topic modeling to the extracted paths.•We considered three main aspects of trajectory analysis when analyzing the represented documents through the extracted paths: emergence, authority, and topic dynamics.•We claimed that a small-scale dataset results in main paths with relatively short lengths of paths. We exploited the longest path algorithm, which searches paths exhaustively, enabling the identification of more sequences of knowledge transfer in the large-scale datasets.•This study revealed the knowledge development tracks of topics in the healthcare informatics field. Topic-integrated longest paths highlight important topics that move from the main path to extended subfields. Main path analysis (MPA) is the most widely accepted approach to tracing knowledge transfer in a research field. In this study, we extracted multiple longest paths from the multidisciplinary academic field's citation network and integrating topic modeling to the extracted paths. We consider three main aspects of trajectory analysis when analyzing the represented documents through the extracted paths: emergence, authority, and topic dynamics. For path extraction, we adopt the longest path algorithm that consists of the following three steps: 1) topological sort, 2) edge relaxation, and 3) multiple path extraction. For topic integration into multiple paths, we employ latent Dirichlet allocation (LDA) by utilizing the topic-document matrix that LDA derives to select an article's topic from the citation network, where each article is labeled with the topic that is assigned with the highest topical probability for that article. We conduct a series of experiments to examine the results on a dataset from the field of healthcare informatics that PubMed provides.
ISSN:1751-1577
1875-5879
DOI:10.1016/j.joi.2021.101242