Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study

In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included pub...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientometrics 2022-10, Vol.127 (10), p.5863-5877
1. Verfasser: Donner, Paul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5877
container_issue 10
container_start_page 5863
container_title Scientometrics
container_volume 127
creator Donner, Paul
description In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.
doi_str_mv 10.1007/s11192-022-04480-w
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2714473827</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2714473827</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4GrAdWouc8m4K_UKBV3oxk3I5NKmTCdjkqH0bXwWn8zoFNwJ57L5v_8cfgAuMZphhKrrgDGuCUQkdZ4zBHdHYIILxiBhJT4GE4QpgzWm6BSchbBBCaKITcD7vF05b-N6a2Vmle6iNVaKaF2XOZO9rGe3s6_PuNbBBuh1K6JWWT807UEUbjKR9d45A1NJ10ndxyzEQe3PwYkRbdAXhz0Fb_d3r4tHuHx-eFrMl1CSmsY0K8QaKkxNylqoUpSFxqUptMhlqVijlGFFzhppGoVrgXRpjDANrUVVMcoEnYKr0Te98THoEPnGDb5LJzmpcJ5XlJEqqciokt6F4LXhvbdb4fccI_6TIR8z5ClD_psh3yWIjlBI4m6l_Z_1P9Q3qI53gw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2714473827</pqid></control><display><type>article</type><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><source>SpringerLink Journals - AutoHoldings</source><creator>Donner, Paul</creator><creatorcontrib>Donner, Paul</creatorcontrib><description>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-022-04480-w</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Academic publications ; Algorithms ; Classification ; Computer Science ; Dissertations &amp; theses ; Information Storage and Retrieval ; Library Science</subject><ispartof>Scientometrics, 2022-10, Vol.127 (10), p.5863-5877</ispartof><rights>The Author(s) 2022</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</citedby><cites>FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</cites><orcidid>0000-0001-5737-8483</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11192-022-04480-w$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11192-022-04480-w$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Donner, Paul</creatorcontrib><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</description><subject>Academic publications</subject><subject>Algorithms</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Dissertations &amp; theses</subject><subject>Information Storage and Retrieval</subject><subject>Library Science</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9kMtKAzEUhoMoWKsv4GrAdWouc8m4K_UKBV3oxk3I5NKmTCdjkqH0bXwWn8zoFNwJ57L5v_8cfgAuMZphhKrrgDGuCUQkdZ4zBHdHYIILxiBhJT4GE4QpgzWm6BSchbBBCaKITcD7vF05b-N6a2Vmle6iNVaKaF2XOZO9rGe3s6_PuNbBBuh1K6JWWT807UEUbjKR9d45A1NJ10ndxyzEQe3PwYkRbdAXhz0Fb_d3r4tHuHx-eFrMl1CSmsY0K8QaKkxNylqoUpSFxqUptMhlqVijlGFFzhppGoVrgXRpjDANrUVVMcoEnYKr0Te98THoEPnGDb5LJzmpcJ5XlJEqqciokt6F4LXhvbdb4fccI_6TIR8z5ClD_psh3yWIjlBI4m6l_Z_1P9Q3qI53gw</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Donner, Paul</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0001-5737-8483</orcidid></search><sort><creationdate>20221001</creationdate><title>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</title><author>Donner, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-c2708b3af9269ad6a65e16f5ea4c6d8bddf8548bcfbd19a0e6ffafb39a77838a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Academic publications</topic><topic>Algorithms</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Dissertations &amp; theses</topic><topic>Information Storage and Retrieval</topic><topic>Library Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Donner, Paul</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Donner, Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>127</volume><issue>10</issue><spage>5863</spage><epage>5877</epage><pages>5863-5877</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>In this study we propose and evaluate a method to automatically identify the journal publications that are related to a Ph.D. thesis using bibliographical data of both items. We build a manually curated ground truth dataset from German cumulative doctoral theses that explicitly list the included publications, which we match with records in the Scopus database. We then test supervised classification methods on the task of identifying the correct associated publications among high numbers of potential candidates using features of the thesis and publication records. The results indicate that this approach results in good match quality in general and with the best results attained by the “random forest” classification algorithm.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-022-04480-w</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-5737-8483</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0138-9130
ispartof Scientometrics, 2022-10, Vol.127 (10), p.5863-5877
issn 0138-9130
1588-2861
language eng
recordid cdi_proquest_journals_2714473827
source SpringerLink Journals - AutoHoldings
subjects Academic publications
Algorithms
Classification
Computer Science
Dissertations & theses
Information Storage and Retrieval
Library Science
title Algorithmic identification of Ph.D. thesis-related publications: a proof-of-concept study
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T08%3A42%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Algorithmic%20identification%20of%20Ph.D.%C2%A0thesis-related%20publications:%20a%20proof-of-concept%20study&rft.jtitle=Scientometrics&rft.au=Donner,%20Paul&rft.date=2022-10-01&rft.volume=127&rft.issue=10&rft.spage=5863&rft.epage=5877&rft.pages=5863-5877&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-022-04480-w&rft_dat=%3Cproquest_cross%3E2714473827%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2714473827&rft_id=info:pmid/&rfr_iscdi=true