An in-text citation classification predictive model for a scholarly search system

We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by exper...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientometrics 2021-07, Vol.126 (7), p.5509-5529
Hauptverfasser:	Aljohani, Naif Radi, Fayoumi, Ayman, Hassan, Saeed-Ul
Format:	Artikel
Sprache:	eng
Schlagworte:	Bibliometrics Citations Classification Computational linguistics Computer applications Computer Science Decision trees Information Storage and Retrieval Learning algorithms Library Science Linguistics Machine learning Mental task performance Prediction models Prototypes Searching Support vector machines Visibility
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	5529
container_issue	7
container_start_page	5509
container_title	Scientometrics
container_volume	126
creator	Aljohani, Naif Radi Fayoumi, Ayman Hassan, Saeed-Ul
description	We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.
doi_str_mv	10.1007/s11192-021-03986-z
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2544895453</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2544895453</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAczSzyWazx1L8goIIeg4xndiU7W5NUrH-eqMrePM0DPO878BDyDnwS-C8uUoA0FaMV8C4aLVinwdkArXWrNIKDsmEg9CsBcGPyUlKa15CgusJeZz1NPQs40emLmSbw9BT19mUgg9uXLcRl8Hl8I50Myyxo36I1NLkVkNnY7enCW10K5r2KePmlBx52yU8-51T8nxz_TS_Y4uH2_v5bMGcgDYzqdD7FnUjOUhRKyuFalAgaq6hsh6bF3SynP0SG6W4FgXWijey9VAoMSUXY-82Dm87TNmsh13sy0tT1VLqtpa1KFQ1Ui4OKUX0ZhvDxsa9AW6-1ZlRnSnqzI8681lCYgylAvevGP-q_0l9AS0Ecig</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544895453</pqid></control><display><type>article</type><title>An in-text citation classification predictive model for a scholarly search system</title><source>SpringerLink Journals - AutoHoldings</source><creator>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</creator><creatorcontrib>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</creatorcontrib><description>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-021-03986-z</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Bibliometrics ; Citations ; Classification ; Computational linguistics ; Computer applications ; Computer Science ; Decision trees ; Information Storage and Retrieval ; Learning algorithms ; Library Science ; Linguistics ; Machine learning ; Mental task performance ; Prediction models ; Prototypes ; Searching ; Support vector machines ; Visibility</subject><ispartof>Scientometrics, 2021-07, Vol.126 (7), p.5509-5529</ispartof><rights>Akadémiai Kiadó, Budapest, Hungary 2021</rights><rights>Akadémiai Kiadó, Budapest, Hungary 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</citedby><cites>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</cites><orcidid>0000-0002-6509-9190</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11192-021-03986-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11192-021-03986-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids></links><search><creatorcontrib>Aljohani, Naif Radi</creatorcontrib><creatorcontrib>Fayoumi, Ayman</creatorcontrib><creatorcontrib>Hassan, Saeed-Ul</creatorcontrib><title>An in-text citation classification predictive model for a scholarly search system</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</description><subject>Bibliometrics</subject><subject>Citations</subject><subject>Classification</subject><subject>Computational linguistics</subject><subject>Computer applications</subject><subject>Computer Science</subject><subject>Decision trees</subject><subject>Information Storage and Retrieval</subject><subject>Learning algorithms</subject><subject>Library Science</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Mental task performance</subject><subject>Prediction models</subject><subject>Prototypes</subject><subject>Searching</subject><subject>Support vector machines</subject><subject>Visibility</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAczSzyWazx1L8goIIeg4xndiU7W5NUrH-eqMrePM0DPO878BDyDnwS-C8uUoA0FaMV8C4aLVinwdkArXWrNIKDsmEg9CsBcGPyUlKa15CgusJeZz1NPQs40emLmSbw9BT19mUgg9uXLcRl8Hl8I50Myyxo36I1NLkVkNnY7enCW10K5r2KePmlBx52yU8-51T8nxz_TS_Y4uH2_v5bMGcgDYzqdD7FnUjOUhRKyuFalAgaq6hsh6bF3SynP0SG6W4FgXWijey9VAoMSUXY-82Dm87TNmsh13sy0tT1VLqtpa1KFQ1Ui4OKUX0ZhvDxsa9AW6-1ZlRnSnqzI8681lCYgylAvevGP-q_0l9AS0Ecig</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Aljohani, Naif Radi</creator><creator>Fayoumi, Ayman</creator><creator>Hassan, Saeed-Ul</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0002-6509-9190</orcidid></search><sort><creationdate>20210701</creationdate><title>An in-text citation classification predictive model for a scholarly search system</title><author>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Bibliometrics</topic><topic>Citations</topic><topic>Classification</topic><topic>Computational linguistics</topic><topic>Computer applications</topic><topic>Computer Science</topic><topic>Decision trees</topic><topic>Information Storage and Retrieval</topic><topic>Learning algorithms</topic><topic>Library Science</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Mental task performance</topic><topic>Prediction models</topic><topic>Prototypes</topic><topic>Searching</topic><topic>Support vector machines</topic><topic>Visibility</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aljohani, Naif Radi</creatorcontrib><creatorcontrib>Fayoumi, Ayman</creatorcontrib><creatorcontrib>Hassan, Saeed-Ul</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aljohani, Naif Radi</au><au>Fayoumi, Ayman</au><au>Hassan, Saeed-Ul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An in-text citation classification predictive model for a scholarly search system</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2021-07-01</date><risdate>2021</risdate><volume>126</volume><issue>7</issue><spage>5509</spage><epage>5529</epage><pages>5509-5529</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-021-03986-z</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-6509-9190</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0138-9130
ispartof	Scientometrics, 2021-07, Vol.126 (7), p.5509-5529
issn	0138-9130 1588-2861
language	eng
recordid	cdi_proquest_journals_2544895453
source	SpringerLink Journals - AutoHoldings
subjects	Bibliometrics Citations Classification Computational linguistics Computer applications Computer Science Decision trees Information Storage and Retrieval Learning algorithms Library Science Linguistics Machine learning Mental task performance Prediction models Prototypes Searching Support vector machines Visibility
title	An in-text citation classification predictive model for a scholarly search system
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A33%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20in-text%20citation%20classification%20predictive%20model%20for%20a%20scholarly%20search%20system&rft.jtitle=Scientometrics&rft.au=Aljohani,%20Naif%20Radi&rft.date=2021-07-01&rft.volume=126&rft.issue=7&rft.spage=5509&rft.epage=5529&rft.pages=5509-5529&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-021-03986-z&rft_dat=%3Cproquest_cross%3E2544895453%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2544895453&rft_id=info:pmid/&rfr_iscdi=true