An in-text citation classification predictive model for a scholarly search system

We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by exper...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientometrics 2021-07, Vol.126 (7), p.5509-5529
Hauptverfasser:	Aljohani, Naif Radi, Fayoumi, Ayman, Hassan, Saeed-Ul
Format:	Artikel
Sprache:	eng
Schlagworte:	Bibliometrics Citations Classification Computational linguistics Computer applications Computer Science Decision trees Information Storage and Retrieval Learning algorithms Library Science Linguistics Machine learning Mental task performance Prediction models Prototypes Searching Support vector machines Visibility
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.
ISSN:	0138-9130 1588-2861
DOI:	10.1007/s11192-021-03986-z