SiSOB data extraction and codification: A tool to analyze scientific careers

•The paper describes the methodology and software tool used to build a database on the careers and productivity of academics.•The software provides data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database.•The software develop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Research policy 2015-11, Vol.44 (9), p.1645-1658
Hauptverfasser: Geuna, Aldo, Kataishi, Rodrigo, Toselli, Manuel, Guzmán, Eduardo, Lawson, Cornelia, Fernandez-Zubieta, Ana, Barros, Beatriz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1658
container_issue 9
container_start_page 1645
container_title Research policy
container_volume 44
creator Geuna, Aldo
Kataishi, Rodrigo
Toselli, Manuel
Guzmán, Eduardo
Lawson, Cornelia
Fernandez-Zubieta, Ana
Barros, Beatriz
description •The paper describes the methodology and software tool used to build a database on the careers and productivity of academics.•The software provides data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database.•The software developed is released under free software GNU General Public License.•The methodology and software tool are validated for a sample of US and UK biomedical scientists.•We show that CVs are a valuable source of data to identify the exact point and type of academic mobility.•It is important to differentiate between voluntary and forced mobility, as only the former is associated with higher research performance. This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analyzing scientific production and scientific careers, who it is hoped will develop this tool further.
doi_str_mv 10.1016/j.respol.2015.01.017
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1758937792</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0048733315000190</els_id><sourcerecordid>3782931221</sourcerecordid><originalsourceid>FETCH-LOGICAL-c477t-6bb96bc993886aab61f1ed4cc5ac063584b2fbf7eefa606ec56c8c9f451bc09c3</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-Aw8FL15aJ9smaTwIq_gPFjyo55BOp5ClbtakK66f3tT15EF4zMDM7w3MY-yUQ8GBy4tlESiufV_MgIsCeJLaYxNeqzJXcib22QSgqnNVluUhO4pxCQC8Aj1hi2f3_HSdtXawGX0OweLg_CqzqzZD37rOoR0Hl9k8G7zvU0k722-_KIvoaDWMSIY2EIV4zA4620c6-e1T9np3-3LzkC-e7h9v5oscK6WGXDaNlg1qXda1tLaRvOPUVojCIshS1FUz65pOEXVWgiQUEmvUXSV4g6CxnLLz3d118O8bioN5cxGp7-2K_CYarkStS6X0LKFnf9Cl34T0wUiBELzSNSSq2lEYfIyBOrMO7s2GreFgxojN0uwiNmPEBniSSrarnY3Ssx-OgvnJBKl1gXAwrXf_H_gGw1KGyw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1705514980</pqid></control><display><type>article</type><title>SiSOB data extraction and codification: A tool to analyze scientific careers</title><source>Access via ScienceDirect (Elsevier)</source><creator>Geuna, Aldo ; Kataishi, Rodrigo ; Toselli, Manuel ; Guzmán, Eduardo ; Lawson, Cornelia ; Fernandez-Zubieta, Ana ; Barros, Beatriz</creator><creatorcontrib>Geuna, Aldo ; Kataishi, Rodrigo ; Toselli, Manuel ; Guzmán, Eduardo ; Lawson, Cornelia ; Fernandez-Zubieta, Ana ; Barros, Beatriz</creatorcontrib><description>•The paper describes the methodology and software tool used to build a database on the careers and productivity of academics.•The software provides data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database.•The software developed is released under free software GNU General Public License.•The methodology and software tool are validated for a sample of US and UK biomedical scientists.•We show that CVs are a valuable source of data to identify the exact point and type of academic mobility.•It is important to differentiate between voluntary and forced mobility, as only the former is associated with higher research performance. This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analyzing scientific production and scientific careers, who it is hoped will develop this tool further.</description><identifier>ISSN: 0048-7333</identifier><identifier>EISSN: 1873-7625</identifier><identifier>DOI: 10.1016/j.respol.2015.01.017</identifier><identifier>CODEN: REPYBP</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Academic careers ; Biological research ; Biotechnology ; Careers ; Data analysis ; Data mining ; Extraction and data integration ; Information retrieval ; Mobility of research scientists ; Public information ; Research productivity ; Scientists ; Software ; Studies ; United Kingdom ; United States</subject><ispartof>Research policy, 2015-11, Vol.44 (9), p.1645-1658</ispartof><rights>2015 The Authors</rights><rights>Copyright Elsevier Sequoia S.A. Nov 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c477t-6bb96bc993886aab61f1ed4cc5ac063584b2fbf7eefa606ec56c8c9f451bc09c3</citedby><cites>FETCH-LOGICAL-c477t-6bb96bc993886aab61f1ed4cc5ac063584b2fbf7eefa606ec56c8c9f451bc09c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.respol.2015.01.017$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Geuna, Aldo</creatorcontrib><creatorcontrib>Kataishi, Rodrigo</creatorcontrib><creatorcontrib>Toselli, Manuel</creatorcontrib><creatorcontrib>Guzmán, Eduardo</creatorcontrib><creatorcontrib>Lawson, Cornelia</creatorcontrib><creatorcontrib>Fernandez-Zubieta, Ana</creatorcontrib><creatorcontrib>Barros, Beatriz</creatorcontrib><title>SiSOB data extraction and codification: A tool to analyze scientific careers</title><title>Research policy</title><description>•The paper describes the methodology and software tool used to build a database on the careers and productivity of academics.•The software provides data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database.•The software developed is released under free software GNU General Public License.•The methodology and software tool are validated for a sample of US and UK biomedical scientists.•We show that CVs are a valuable source of data to identify the exact point and type of academic mobility.•It is important to differentiate between voluntary and forced mobility, as only the former is associated with higher research performance. This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analyzing scientific production and scientific careers, who it is hoped will develop this tool further.</description><subject>Academic careers</subject><subject>Biological research</subject><subject>Biotechnology</subject><subject>Careers</subject><subject>Data analysis</subject><subject>Data mining</subject><subject>Extraction and data integration</subject><subject>Information retrieval</subject><subject>Mobility of research scientists</subject><subject>Public information</subject><subject>Research productivity</subject><subject>Scientists</subject><subject>Software</subject><subject>Studies</subject><subject>United Kingdom</subject><subject>United States</subject><issn>0048-7333</issn><issn>1873-7625</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-Aw8FL15aJ9smaTwIq_gPFjyo55BOp5ClbtakK66f3tT15EF4zMDM7w3MY-yUQ8GBy4tlESiufV_MgIsCeJLaYxNeqzJXcib22QSgqnNVluUhO4pxCQC8Aj1hi2f3_HSdtXawGX0OweLg_CqzqzZD37rOoR0Hl9k8G7zvU0k722-_KIvoaDWMSIY2EIV4zA4620c6-e1T9np3-3LzkC-e7h9v5oscK6WGXDaNlg1qXda1tLaRvOPUVojCIshS1FUz65pOEXVWgiQUEmvUXSV4g6CxnLLz3d118O8bioN5cxGp7-2K_CYarkStS6X0LKFnf9Cl34T0wUiBELzSNSSq2lEYfIyBOrMO7s2GreFgxojN0uwiNmPEBniSSrarnY3Ssx-OgvnJBKl1gXAwrXf_H_gGw1KGyw</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Geuna, Aldo</creator><creator>Kataishi, Rodrigo</creator><creator>Toselli, Manuel</creator><creator>Guzmán, Eduardo</creator><creator>Lawson, Cornelia</creator><creator>Fernandez-Zubieta, Ana</creator><creator>Barros, Beatriz</creator><general>Elsevier B.V</general><general>Elsevier Sequoia S.A</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8BJ</scope><scope>FQK</scope><scope>JBE</scope><scope>JQ2</scope></search><sort><creationdate>20151101</creationdate><title>SiSOB data extraction and codification: A tool to analyze scientific careers</title><author>Geuna, Aldo ; Kataishi, Rodrigo ; Toselli, Manuel ; Guzmán, Eduardo ; Lawson, Cornelia ; Fernandez-Zubieta, Ana ; Barros, Beatriz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c477t-6bb96bc993886aab61f1ed4cc5ac063584b2fbf7eefa606ec56c8c9f451bc09c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Academic careers</topic><topic>Biological research</topic><topic>Biotechnology</topic><topic>Careers</topic><topic>Data analysis</topic><topic>Data mining</topic><topic>Extraction and data integration</topic><topic>Information retrieval</topic><topic>Mobility of research scientists</topic><topic>Public information</topic><topic>Research productivity</topic><topic>Scientists</topic><topic>Software</topic><topic>Studies</topic><topic>United Kingdom</topic><topic>United States</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Geuna, Aldo</creatorcontrib><creatorcontrib>Kataishi, Rodrigo</creatorcontrib><creatorcontrib>Toselli, Manuel</creatorcontrib><creatorcontrib>Guzmán, Eduardo</creatorcontrib><creatorcontrib>Lawson, Cornelia</creatorcontrib><creatorcontrib>Fernandez-Zubieta, Ana</creatorcontrib><creatorcontrib>Barros, Beatriz</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Research policy</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Geuna, Aldo</au><au>Kataishi, Rodrigo</au><au>Toselli, Manuel</au><au>Guzmán, Eduardo</au><au>Lawson, Cornelia</au><au>Fernandez-Zubieta, Ana</au><au>Barros, Beatriz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SiSOB data extraction and codification: A tool to analyze scientific careers</atitle><jtitle>Research policy</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>44</volume><issue>9</issue><spage>1645</spage><epage>1658</epage><pages>1645-1658</pages><issn>0048-7333</issn><eissn>1873-7625</eissn><coden>REPYBP</coden><abstract>•The paper describes the methodology and software tool used to build a database on the careers and productivity of academics.•The software provides data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database.•The software developed is released under free software GNU General Public License.•The methodology and software tool are validated for a sample of US and UK biomedical scientists.•We show that CVs are a valuable source of data to identify the exact point and type of academic mobility.•It is important to differentiate between voluntary and forced mobility, as only the former is associated with higher research performance. This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analyzing scientific production and scientific careers, who it is hoped will develop this tool further.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.respol.2015.01.017</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0048-7333
ispartof Research policy, 2015-11, Vol.44 (9), p.1645-1658
issn 0048-7333
1873-7625
language eng
recordid cdi_proquest_miscellaneous_1758937792
source Access via ScienceDirect (Elsevier)
subjects Academic careers
Biological research
Biotechnology
Careers
Data analysis
Data mining
Extraction and data integration
Information retrieval
Mobility of research scientists
Public information
Research productivity
Scientists
Software
Studies
United Kingdom
United States
title SiSOB data extraction and codification: A tool to analyze scientific careers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A35%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SiSOB%20data%20extraction%20and%20codification:%20A%20tool%20to%20analyze%20scientific%20careers&rft.jtitle=Research%20policy&rft.au=Geuna,%20Aldo&rft.date=2015-11-01&rft.volume=44&rft.issue=9&rft.spage=1645&rft.epage=1658&rft.pages=1645-1658&rft.issn=0048-7333&rft.eissn=1873-7625&rft.coden=REPYBP&rft_id=info:doi/10.1016/j.respol.2015.01.017&rft_dat=%3Cproquest_cross%3E3782931221%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1705514980&rft_id=info:pmid/&rft_els_id=S0048733315000190&rfr_iscdi=true