A benchmark of Spanish language datasets for computationally driven research

In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of information science 2023-12, Vol.49 (6), p.1451-1461
Hauptverfasser: Candela, Gustavo, Sáez, María-Dolores, Escobar, Pilar, Marco-Such, Manuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes.
ISSN:0165-5515
1741-6485
DOI:10.1177/01655515211060530