Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness

Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in res...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the Association for Information Science and Technology 2025-01
Hauptverfasser: Céspedes, Lucía, Kozlowski, Diego, Pradier, Carolina, Sainte‐Marie, Maxime Holmberg, Shokida, Natsumi Solange, Benz, Pierre, Poitras, Constance, Ninkov, Anton Boudreau, Ebrahimy, Saeideh, Ayeni, Philips, Filali, Sarra, Li, Bing, Larivière, Vincent
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title Journal of the Association for Information Science and Technology
container_volume
creator Céspedes, Lucía
Kozlowski, Diego
Pradier, Carolina
Sainte‐Marie, Maxime Holmberg
Shokida, Natsumi Solange
Benz, Pierre
Poitras, Constance
Ninkov, Anton Boudreau
Ebrahimy, Saeideh
Ayeni, Philips
Filali, Sarra
Li, Bing
Larivière, Vincent
description Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.
doi_str_mv 10.1002/asi.24979
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1002_asi_24979</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1002_asi_24979</sourcerecordid><originalsourceid>FETCH-LOGICAL-c154t-bd060a45ff1bf1c360119cefaeceecd31c7ad5f04330ffc980af2a62121a1fa93</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsNQe_Ad79ZA6s5ukjbdSqhUKveg5TCezNZIvspti_72piqf3gXdmYB6l7hHmCGAeyZdzE2eL7EpNjLUQYRrb63-2ya2aef8JAAjZMjE4UcXmRNVAoWyOOnyIrkYYSh9K1tyepKej6NbpfSfNqpIv_aRXjSbvxftamnDpaglUUCBNzENPfNbUFON23VUSpBkn79SNo8rL7C-n6v1587beRrv9y-t6tYsYkzhEhwJSoDhxDg8O2aaAmLE4EhbhwiIvqEgcxOM7znG2BHKGUoMGCR1ldqoefu9y33rfi8u7vqypP-cI-cVQPhrKfwzZb0esWz4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness</title><source>Wiley Online Library All Journals</source><creator>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Sainte‐Marie, Maxime Holmberg ; Shokida, Natsumi Solange ; Benz, Pierre ; Poitras, Constance ; Ninkov, Anton Boudreau ; Ebrahimy, Saeideh ; Ayeni, Philips ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</creator><creatorcontrib>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Sainte‐Marie, Maxime Holmberg ; Shokida, Natsumi Solange ; Benz, Pierre ; Poitras, Constance ; Ninkov, Anton Boudreau ; Ebrahimy, Saeideh ; Ayeni, Philips ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</creatorcontrib><description>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.</description><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.24979</identifier><language>eng</language><ispartof>Journal of the Association for Information Science and Technology, 2025-01</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c154t-bd060a45ff1bf1c360119cefaeceecd31c7ad5f04330ffc980af2a62121a1fa93</cites><orcidid>0000-0001-5896-3377 ; 0009-0007-5058-6352</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Céspedes, Lucía</creatorcontrib><creatorcontrib>Kozlowski, Diego</creatorcontrib><creatorcontrib>Pradier, Carolina</creatorcontrib><creatorcontrib>Sainte‐Marie, Maxime Holmberg</creatorcontrib><creatorcontrib>Shokida, Natsumi Solange</creatorcontrib><creatorcontrib>Benz, Pierre</creatorcontrib><creatorcontrib>Poitras, Constance</creatorcontrib><creatorcontrib>Ninkov, Anton Boudreau</creatorcontrib><creatorcontrib>Ebrahimy, Saeideh</creatorcontrib><creatorcontrib>Ayeni, Philips</creatorcontrib><creatorcontrib>Filali, Sarra</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Larivière, Vincent</creatorcontrib><title>Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness</title><title>Journal of the Association for Information Science and Technology</title><description>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.</description><issn>2330-1635</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNo9kE1Lw0AQhhdRsNQe_Ad79ZA6s5ukjbdSqhUKveg5TCezNZIvspti_72piqf3gXdmYB6l7hHmCGAeyZdzE2eL7EpNjLUQYRrb63-2ya2aef8JAAjZMjE4UcXmRNVAoWyOOnyIrkYYSh9K1tyepKej6NbpfSfNqpIv_aRXjSbvxftamnDpaglUUCBNzENPfNbUFON23VUSpBkn79SNo8rL7C-n6v1587beRrv9y-t6tYsYkzhEhwJSoDhxDg8O2aaAmLE4EhbhwiIvqEgcxOM7znG2BHKGUoMGCR1ldqoefu9y33rfi8u7vqypP-cI-cVQPhrKfwzZb0esWz4</recordid><startdate>20250114</startdate><enddate>20250114</enddate><creator>Céspedes, Lucía</creator><creator>Kozlowski, Diego</creator><creator>Pradier, Carolina</creator><creator>Sainte‐Marie, Maxime Holmberg</creator><creator>Shokida, Natsumi Solange</creator><creator>Benz, Pierre</creator><creator>Poitras, Constance</creator><creator>Ninkov, Anton Boudreau</creator><creator>Ebrahimy, Saeideh</creator><creator>Ayeni, Philips</creator><creator>Filali, Sarra</creator><creator>Li, Bing</creator><creator>Larivière, Vincent</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5896-3377</orcidid><orcidid>https://orcid.org/0009-0007-5058-6352</orcidid></search><sort><creationdate>20250114</creationdate><title>Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness</title><author>Céspedes, Lucía ; Kozlowski, Diego ; Pradier, Carolina ; Sainte‐Marie, Maxime Holmberg ; Shokida, Natsumi Solange ; Benz, Pierre ; Poitras, Constance ; Ninkov, Anton Boudreau ; Ebrahimy, Saeideh ; Ayeni, Philips ; Filali, Sarra ; Li, Bing ; Larivière, Vincent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c154t-bd060a45ff1bf1c360119cefaeceecd31c7ad5f04330ffc980af2a62121a1fa93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Céspedes, Lucía</creatorcontrib><creatorcontrib>Kozlowski, Diego</creatorcontrib><creatorcontrib>Pradier, Carolina</creatorcontrib><creatorcontrib>Sainte‐Marie, Maxime Holmberg</creatorcontrib><creatorcontrib>Shokida, Natsumi Solange</creatorcontrib><creatorcontrib>Benz, Pierre</creatorcontrib><creatorcontrib>Poitras, Constance</creatorcontrib><creatorcontrib>Ninkov, Anton Boudreau</creatorcontrib><creatorcontrib>Ebrahimy, Saeideh</creatorcontrib><creatorcontrib>Ayeni, Philips</creatorcontrib><creatorcontrib>Filali, Sarra</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Larivière, Vincent</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of the Association for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Céspedes, Lucía</au><au>Kozlowski, Diego</au><au>Pradier, Carolina</au><au>Sainte‐Marie, Maxime Holmberg</au><au>Shokida, Natsumi Solange</au><au>Benz, Pierre</au><au>Poitras, Constance</au><au>Ninkov, Anton Boudreau</au><au>Ebrahimy, Saeideh</au><au>Ayeni, Philips</au><au>Filali, Sarra</au><au>Li, Bing</au><au>Larivière, Vincent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness</atitle><jtitle>Journal of the Association for Information Science and Technology</jtitle><date>2025-01-14</date><risdate>2025</risdate><issn>2330-1635</issn><eissn>2330-1643</eissn><abstract>Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.</abstract><doi>10.1002/asi.24979</doi><orcidid>https://orcid.org/0000-0001-5896-3377</orcidid><orcidid>https://orcid.org/0009-0007-5058-6352</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2330-1635
ispartof Journal of the Association for Information Science and Technology, 2025-01
issn 2330-1635
2330-1643
language eng
recordid cdi_crossref_primary_10_1002_asi_24979
source Wiley Online Library All Journals
title Evaluating the linguistic coverage of OpenAlex : An assessment of metadata accuracy and completeness
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A02%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20the%20linguistic%20coverage%20of%20OpenAlex%20:%20An%20assessment%20of%20metadata%20accuracy%20and%20completeness&rft.jtitle=Journal%20of%20the%20Association%20for%20Information%20Science%20and%20Technology&rft.au=C%C3%A9spedes,%20Luc%C3%ADa&rft.date=2025-01-14&rft.issn=2330-1635&rft.eissn=2330-1643&rft_id=info:doi/10.1002/asi.24979&rft_dat=%3Ccrossref%3E10_1002_asi_24979%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true