An in-text citation classification predictive model for a scholarly search system
We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by exper...
Gespeichert in:
Veröffentlicht in: | Scientometrics 2021-07, Vol.126 (7), p.5509-5529 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5529 |
---|---|
container_issue | 7 |
container_start_page | 5509 |
container_title | Scientometrics |
container_volume | 126 |
creator | Aljohani, Naif Radi Fayoumi, Ayman Hassan, Saeed-Ul |
description | We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user. |
doi_str_mv | 10.1007/s11192-021-03986-z |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2544895453</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2544895453</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAczSzyWazx1L8goIIeg4xndiU7W5NUrH-eqMrePM0DPO878BDyDnwS-C8uUoA0FaMV8C4aLVinwdkArXWrNIKDsmEg9CsBcGPyUlKa15CgusJeZz1NPQs40emLmSbw9BT19mUgg9uXLcRl8Hl8I50Myyxo36I1NLkVkNnY7enCW10K5r2KePmlBx52yU8-51T8nxz_TS_Y4uH2_v5bMGcgDYzqdD7FnUjOUhRKyuFalAgaq6hsh6bF3SynP0SG6W4FgXWijey9VAoMSUXY-82Dm87TNmsh13sy0tT1VLqtpa1KFQ1Ui4OKUX0ZhvDxsa9AW6-1ZlRnSnqzI8681lCYgylAvevGP-q_0l9AS0Ecig</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544895453</pqid></control><display><type>article</type><title>An in-text citation classification predictive model for a scholarly search system</title><source>SpringerLink Journals - AutoHoldings</source><creator>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</creator><creatorcontrib>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</creatorcontrib><description>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-021-03986-z</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Bibliometrics ; Citations ; Classification ; Computational linguistics ; Computer applications ; Computer Science ; Decision trees ; Information Storage and Retrieval ; Learning algorithms ; Library Science ; Linguistics ; Machine learning ; Mental task performance ; Prediction models ; Prototypes ; Searching ; Support vector machines ; Visibility</subject><ispartof>Scientometrics, 2021-07, Vol.126 (7), p.5509-5529</ispartof><rights>Akadémiai Kiadó, Budapest, Hungary 2021</rights><rights>Akadémiai Kiadó, Budapest, Hungary 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</citedby><cites>FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</cites><orcidid>0000-0002-6509-9190</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11192-021-03986-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11192-021-03986-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids></links><search><creatorcontrib>Aljohani, Naif Radi</creatorcontrib><creatorcontrib>Fayoumi, Ayman</creatorcontrib><creatorcontrib>Hassan, Saeed-Ul</creatorcontrib><title>An in-text citation classification predictive model for a scholarly search system</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</description><subject>Bibliometrics</subject><subject>Citations</subject><subject>Classification</subject><subject>Computational linguistics</subject><subject>Computer applications</subject><subject>Computer Science</subject><subject>Decision trees</subject><subject>Information Storage and Retrieval</subject><subject>Learning algorithms</subject><subject>Library Science</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Mental task performance</subject><subject>Prediction models</subject><subject>Prototypes</subject><subject>Searching</subject><subject>Support vector machines</subject><subject>Visibility</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAczSzyWazx1L8goIIeg4xndiU7W5NUrH-eqMrePM0DPO878BDyDnwS-C8uUoA0FaMV8C4aLVinwdkArXWrNIKDsmEg9CsBcGPyUlKa15CgusJeZz1NPQs40emLmSbw9BT19mUgg9uXLcRl8Hl8I50Myyxo36I1NLkVkNnY7enCW10K5r2KePmlBx52yU8-51T8nxz_TS_Y4uH2_v5bMGcgDYzqdD7FnUjOUhRKyuFalAgaq6hsh6bF3SynP0SG6W4FgXWijey9VAoMSUXY-82Dm87TNmsh13sy0tT1VLqtpa1KFQ1Ui4OKUX0ZhvDxsa9AW6-1ZlRnSnqzI8681lCYgylAvevGP-q_0l9AS0Ecig</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Aljohani, Naif Radi</creator><creator>Fayoumi, Ayman</creator><creator>Hassan, Saeed-Ul</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0002-6509-9190</orcidid></search><sort><creationdate>20210701</creationdate><title>An in-text citation classification predictive model for a scholarly search system</title><author>Aljohani, Naif Radi ; Fayoumi, Ayman ; Hassan, Saeed-Ul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-46eff9e874014356a4367e3ee80812afe7bec4874fde7660839e8860749f1ee83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Bibliometrics</topic><topic>Citations</topic><topic>Classification</topic><topic>Computational linguistics</topic><topic>Computer applications</topic><topic>Computer Science</topic><topic>Decision trees</topic><topic>Information Storage and Retrieval</topic><topic>Learning algorithms</topic><topic>Library Science</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Mental task performance</topic><topic>Prediction models</topic><topic>Prototypes</topic><topic>Searching</topic><topic>Support vector machines</topic><topic>Visibility</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aljohani, Naif Radi</creatorcontrib><creatorcontrib>Fayoumi, Ayman</creatorcontrib><creatorcontrib>Hassan, Saeed-Ul</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aljohani, Naif Radi</au><au>Fayoumi, Ayman</au><au>Hassan, Saeed-Ul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An in-text citation classification predictive model for a scholarly search system</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2021-07-01</date><risdate>2021</risdate><volume>126</volume><issue>7</issue><spage>5509</spage><epage>5529</epage><pages>5509-5529</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-021-03986-z</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-6509-9190</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0138-9130 |
ispartof | Scientometrics, 2021-07, Vol.126 (7), p.5509-5529 |
issn | 0138-9130 1588-2861 |
language | eng |
recordid | cdi_proquest_journals_2544895453 |
source | SpringerLink Journals - AutoHoldings |
subjects | Bibliometrics Citations Classification Computational linguistics Computer applications Computer Science Decision trees Information Storage and Retrieval Learning algorithms Library Science Linguistics Machine learning Mental task performance Prediction models Prototypes Searching Support vector machines Visibility |
title | An in-text citation classification predictive model for a scholarly search system |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A33%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20in-text%20citation%20classification%20predictive%20model%20for%20a%20scholarly%20search%20system&rft.jtitle=Scientometrics&rft.au=Aljohani,%20Naif%20Radi&rft.date=2021-07-01&rft.volume=126&rft.issue=7&rft.spage=5509&rft.epage=5529&rft.pages=5509-5529&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-021-03986-z&rft_dat=%3Cproquest_cross%3E2544895453%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2544895453&rft_id=info:pmid/&rfr_iscdi=true |