Wikipedia Cultural Diversity Dataset
For each existing Wikipedia language edition, the dataset contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken (places, traditions, language, politics, agriculture, b...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | For each existing Wikipedia language edition, the dataset contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken (places, traditions, language, politics, agriculture, biographies, events, etcetera.).For each article, the dataset contains a rich set of context-related features, including geolocation, ISO codes, wikidata properties related to the language or to the corresponding country or territories, as well as related categories, among many other metadata. Other general article features are additional included, such as the number of edits and number of pageviews.The methodology employed to classify articles through machine learning is described in: Wikipedia Cultural Diversity Observatory: Cultural Context Content Methodology Miquel-Ribé, M., & Laniado, D. (2018). Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers in Physics.The uses of the dataset are several but we want to highlight three: 1) Wikipedia Culture Gap assessment and overall improvement of the cultural diversity, 2) Academic research in the Digital Humanities field, and 3) User-generated Content based technologies.You can read more at wcdo.wmflabs.org. |
---|---|
DOI: | 10.6084/m9.figshare.7039514 |