Co-citation and Co-authorship Networks of Statisticians
We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows u...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We collected and cleaned a large data set on publications in statistics. The
data set consists of the coauthor relationships and citation relationships of
83, 331 papers published in 36 representative journals in statistics,
probability, and machine learning, spanning 41 years. The data set allows us to
construct many different networks, and motivates a number of research problems
about the research patterns and trends, research impacts, and network topology
of the statistics community. In this paper we focus on (i) using the citation
relationships to estimate the research interests of authors, and (ii) using the
coauthor relationships to study the network topology.
Using co-citation networks we constructed, we discover a "statistics
triangle", reminiscent of the statistical philosophy triangle (Efron, 1998). We
propose new approaches to constructing the "research map" of statisticians, as
well as the "research trajectory" for a given author to visualize his/her
research interest evolvement. Using co-authorship networks we constructed, we
discover a multi-layer community tree and produce a Sankey diagram to visualize
the author migrations in different sub-areas. We also propose several new
metrics for research diversity of individual authors.
We find that "Bayes", "Biostatistics", and "Nonparametric" are three primary
areas in statistics. We also identify 15 sub-areas, each of which can be viewed
as a weighted average of the primary areas, and identify several underlying
reasons for the formation of co-authorship communities. We also find that the
research interests of statisticians have evolved significantly in the 41-year
time window we studied: some areas (e.g., biostatistics, high-dimensional data
analysis, etc.) have become increasingly more popular. |
---|---|
DOI: | 10.48550/arxiv.2204.11194 |