Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study
In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included pub...
Gespeichert in:
Veröffentlicht in: | Scientometrics 2022-10, Vol.127 (10), p.5863-5877 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5877 |
---|---|
container_issue | 10 |
container_start_page | 5863 |
container_title | Scientometrics |
container_volume | 127 |
creator | Donner, Paul |
description | In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm. |
doi_str_mv | 10.1007/s11192-022-04480-w |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2714473827</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2714473827</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4GrAdWouc8m4K_UKBV3oxk3I5NKmTCdjkqH0bXwWn8zoFNwJ57L5v_8cfgAuMZphhKrrgDGuCUQkdZ4zBHdHYIILxiBhJT4GE4QpgzWm6BSchbBBCaKITcD7vF05b-N6a2Vmle6iNVaKaF2XOZO9rGe3s6_PuNbBBuh1K6JWWT807UEUbjKR9d45A1NJ10ndxyzEQe3PwYkRbdAXhz0Fb_d3r4tHuHx-eFrMl1CSmsY0K8QaKkxNylqoUpSFxqUptMhlqVijlGFFzhppGoVrgXRpjDANrUVVMcoEnYKr0Te98THoEPnGDb5LJzmpcJ5XlJEqqciokt6F4LXhvbdb4fccI_6TIR8z5ClD_psh3yWIjlBI4m6l_Z_1P9Q3qI53gw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2714473827</pqid></control><display><type>article</type><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><source>SpringerLink Journals - AutoHoldings</source><creator>Donner, Paul</creator><creatorcontrib>Donner, Paul</creatorcontrib><description>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-022-04480-w</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Academic publications ; Algorithms ; Classification ; Computer Science ; Dissertations & theses ; Information Storage and Retrieval ; Library Science</subject><ispartof>Scientometrics, 2022-10, Vol.127 (10), p.5863-5877</ispartof><rights>The Author(s) 2022</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</citedby><cites>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</cites><orcidid>0000-0001-5737-8483</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11192-022-04480-w$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11192-022-04480-w$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Donner, Paul</creatorcontrib><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</description><subject>Academic publications</subject><subject>Algorithms</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Dissertations & theses</subject><subject>Information Storage and Retrieval</subject><subject>Library Science</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9kMtKAzEUhoMoWKsv4GrAdWouc8m4K_UKBV3oxk3I5NKmTCdjkqH0bXwWn8zoFNwJ57L5v_8cfgAuMZphhKrrgDGuCUQkdZ4zBHdHYIILxiBhJT4GE4QpgzWm6BSchbBBCaKITcD7vF05b-N6a2Vmle6iNVaKaF2XOZO9rGe3s6_PuNbBBuh1K6JWWT807UEUbjKR9d45A1NJ10ndxyzEQe3PwYkRbdAXhz0Fb_d3r4tHuHx-eFrMl1CSmsY0K8QaKkxNylqoUpSFxqUptMhlqVijlGFFzhppGoVrgXRpjDANrUVVMcoEnYKr0Te98THoEPnGDb5LJzmpcJ5XlJEqqciokt6F4LXhvbdb4fccI_6TIR8z5ClD_psh3yWIjlBI4m6l_Z_1P9Q3qI53gw</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Donner, Paul</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0001-5737-8483</orcidid></search><sort><creationdate>20221001</creationdate><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><author>Donner, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Academic publications</topic><topic>Algorithms</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Dissertations & theses</topic><topic>Information Storage and Retrieval</topic><topic>Library Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Donner, Paul</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Donner, Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>127</volume><issue>10</issue><spage>5863</spage><epage>5877</epage><pages>5863-5877</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-022-04480-w</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-5737-8483</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0138-9130 |
ispartof | Scientometrics, 2022-10, Vol.127 (10), p.5863-5877 |
issn | 0138-9130 1588-2861 |
language | eng |
recordid | cdi_proquest_journals_2714473827 |
source | SpringerLink Journals - AutoHoldings |
subjects | Academic publications Algorithms Classification Computer Science Dissertations & theses Information Storage and Retrieval Library Science |
title | Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T08%3A42%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Algorithmic%20identification%20of%20Ph.D.%C2%A0thesis-related%20publications:%20a%20proof-of-concept%20study&rft.jtitle=Scientometrics&rft.au=Donner,%20Paul&rft.date=2022-10-01&rft.volume=127&rft.issue=10&rft.spage=5863&rft.epage=5877&rft.pages=5863-5877&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-022-04480-w&rft_dat=%3Cproquest_cross%3E2714473827%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2714473827&rft_id=info:pmid/&rfr_iscdi=true |