Important citation identification by exploiting the syntactic and contextual information of citations

Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to di...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientometrics 2020-12, Vol.125 (3), p.2109-2129
Hauptverfasser: Wang, Mingyang, Zhang, Jiaqi, Jiao, Shijia, Zhang, Xiangrong, Zhu, Na, Chen, Guangsheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.
ISSN:0138-9130
1588-2861
DOI:10.1007/s11192-020-03677-1