Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness

Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-09
Hauptverfasser: Céspedes, Lucía, Kozlowski, Diego, Pradier, Carolina, Maxime Holmberg Sainte-Marie, Natsumi, Solange Shokida, Benz, Pierre, Poitras, Constance, Anton Boudreau Ninkov, Ebrahimy, Saeideh, Philips Ayeni, Filali, Sarra, Li, Bing, Larivière, Vincent
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Céspedes, Lucía
Kozlowski, Diego
Pradier, Carolina
Maxime Holmberg Sainte-Marie
Natsumi, Solange Shokida
Benz, Pierre
Poitras, Constance
Anton Boudreau Ninkov
Ebrahimy, Saeideh
Philips Ayeni
Filali, Sarra
Li, Bing
Larivière, Vincent
description Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing. However, more work is needed at infrastructural level to ensure the quality of metadata on language.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3106537843</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3106537843</sourcerecordid><originalsourceid>FETCH-proquest_journals_31065378433</originalsourceid><addsrcrecordid>eNqNi0EKwjAQRYMgWNQ7DLgW0qS14q5IxYXixr0M6agtNamdpOjtreABXP0H7_2RiJTW8XKdKDURc-ZaSqlWmUpTHQlT9NgE9JW9gb8THAYIFfvKwNb11OGNwF3h1JLNG3ptILeQMxPzg6z_qiN5LNEj5MaEDs0b0JbD-dE25MkO5UyMr9gwzX87FYtdcd7ul23nnoHYX2oXOjuoi47lKtXZOtH6v-oDlcZFlg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106537843</pqid></control><display><type>article</type><title>Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness</title><source>Free E- Journals</source><creator>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Maxime Holmberg Sainte-Marie ; Natsumi, Solange Shokida ; Benz, Pierre ; Poitras, Constance ; Anton Boudreau Ninkov ; Ebrahimy, Saeideh ; Philips Ayeni ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</creator><creatorcontrib>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Maxime Holmberg Sainte-Marie ; Natsumi, Solange Shokida ; Benz, Pierre ; Poitras, Constance ; Anton Boudreau Ninkov ; Ebrahimy, Saeideh ; Philips Ayeni ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</creatorcontrib><description>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing. However, more work is needed at infrastructural level to ensure the quality of metadata on language.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bibliometrics ; Completeness ; English language ; Languages ; Linguistics ; Metadata ; Quality assessment ; Research facilities ; Scholarly publishing</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Céspedes, Lucía</creatorcontrib><creatorcontrib>Kozlowski, Diego</creatorcontrib><creatorcontrib>Pradier, Carolina</creatorcontrib><creatorcontrib>Maxime Holmberg Sainte-Marie</creatorcontrib><creatorcontrib>Natsumi, Solange Shokida</creatorcontrib><creatorcontrib>Benz, Pierre</creatorcontrib><creatorcontrib>Poitras, Constance</creatorcontrib><creatorcontrib>Anton Boudreau Ninkov</creatorcontrib><creatorcontrib>Ebrahimy, Saeideh</creatorcontrib><creatorcontrib>Philips Ayeni</creatorcontrib><creatorcontrib>Filali, Sarra</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Larivière, Vincent</creatorcontrib><title>Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness</title><title>arXiv.org</title><description>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing. However, more work is needed at infrastructural level to ensure the quality of metadata on language.</description><subject>Bibliometrics</subject><subject>Completeness</subject><subject>English language</subject><subject>Languages</subject><subject>Linguistics</subject><subject>Metadata</subject><subject>Quality assessment</subject><subject>Research facilities</subject><subject>Scholarly publishing</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi0EKwjAQRYMgWNQ7DLgW0qS14q5IxYXixr0M6agtNamdpOjtreABXP0H7_2RiJTW8XKdKDURc-ZaSqlWmUpTHQlT9NgE9JW9gb8THAYIFfvKwNb11OGNwF3h1JLNG3ptILeQMxPzg6z_qiN5LNEj5MaEDs0b0JbD-dE25MkO5UyMr9gwzX87FYtdcd7ul23nnoHYX2oXOjuoi47lKtXZOtH6v-oDlcZFlg</recordid><startdate>20240919</startdate><enddate>20240919</enddate><creator>Céspedes, Lucía</creator><creator>Kozlowski, Diego</creator><creator>Pradier, Carolina</creator><creator>Maxime Holmberg Sainte-Marie</creator><creator>Natsumi, Solange Shokida</creator><creator>Benz, Pierre</creator><creator>Poitras, Constance</creator><creator>Anton Boudreau Ninkov</creator><creator>Ebrahimy, Saeideh</creator><creator>Philips Ayeni</creator><creator>Filali, Sarra</creator><creator>Li, Bing</creator><creator>Larivière, Vincent</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240919</creationdate><title>Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness</title><author>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Maxime Holmberg Sainte-Marie ; Natsumi, Solange Shokida ; Benz, Pierre ; Poitras, Constance ; Anton Boudreau Ninkov ; Ebrahimy, Saeideh ; Philips Ayeni ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31065378433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bibliometrics</topic><topic>Completeness</topic><topic>English language</topic><topic>Languages</topic><topic>Linguistics</topic><topic>Metadata</topic><topic>Quality assessment</topic><topic>Research facilities</topic><topic>Scholarly publishing</topic><toplevel>online_resources</toplevel><creatorcontrib>Céspedes, Lucía</creatorcontrib><creatorcontrib>Kozlowski, Diego</creatorcontrib><creatorcontrib>Pradier, Carolina</creatorcontrib><creatorcontrib>Maxime Holmberg Sainte-Marie</creatorcontrib><creatorcontrib>Natsumi, Solange Shokida</creatorcontrib><creatorcontrib>Benz, Pierre</creatorcontrib><creatorcontrib>Poitras, Constance</creatorcontrib><creatorcontrib>Anton Boudreau Ninkov</creatorcontrib><creatorcontrib>Ebrahimy, Saeideh</creatorcontrib><creatorcontrib>Philips Ayeni</creatorcontrib><creatorcontrib>Filali, Sarra</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Larivière, Vincent</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Céspedes, Lucía</au><au>Kozlowski, Diego</au><au>Pradier, Carolina</au><au>Maxime Holmberg Sainte-Marie</au><au>Natsumi, Solange Shokida</au><au>Benz, Pierre</au><au>Poitras, Constance</au><au>Anton Boudreau Ninkov</au><au>Ebrahimy, Saeideh</au><au>Philips Ayeni</au><au>Filali, Sarra</au><au>Li, Bing</au><au>Larivière, Vincent</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness</atitle><jtitle>arXiv.org</jtitle><date>2024-09-19</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing. However, more work is needed at infrastructural level to ensure the quality of metadata on language.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_3106537843
source Free E- Journals
subjects Bibliometrics
Completeness
English language
Languages
Linguistics
Metadata
Quality assessment
Research facilities
Scholarly publishing
title Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T11%3A55%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Evaluating%20the%20Linguistic%20Coverage%20of%20OpenAlex:%20An%20Assessment%20of%20Metadata%20Accuracy%20and%20Completeness&rft.jtitle=arXiv.org&rft.au=C%C3%A9spedes,%20Luc%C3%ADa&rft.date=2024-09-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3106537843%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106537843&rft_id=info:pmid/&rfr_iscdi=true