Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia

Corpora that contain tabular data such as WebTables are a vital resource for the academic community. Essentially, they are the backbone of any modern research in information management. They are used for various tasks of data extraction, knowledge base construction, question answering, column semant...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Lobachevskii journal of mathematics 2023, Vol.44 (1), p.111-122
Hauptverfasser:	Fedorov, P. E., Mironov, A. V., Chernishev, G. A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algebra Analysis Encyclopedias Geometry Information management Knowledge bases (artificial intelligence) Mathematical Logic and Foundations Mathematics Mathematics and Statistics Probability Theory and Stochastic Processes Tables (data) Toolkits Webs
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Corpora that contain tabular data such as WebTables are a vital resource for the academic community. Essentially, they are the backbone of any modern research in information management. They are used for various tasks of data extraction, knowledge base construction, question answering, column semantic type detection and many other. Such corpora are useful not only as a source of data, but also as a base for building test datasets. So far, there were no such corpora for the Russian language and this seriously hindered research in the aforementioned areas. In this paper, we present the first corpus of Web tables created specifically out of Russian language material. It was built via a special toolkit we have developed to crawl the Russian Wikipedia. Both the corpus and the toolkit are open-source and publicly available. Finally, we present a short study that describes Russian Wikipedia tables and their statistics.
ISSN:	1995-0802 1818-9962
DOI:	10.1134/S1995080223010110