Diorisis.duckdb

Duckdb Compilation of the Diorisis Ancient Greek corpus Description The Diorsis Ancient Greek Corpus was created by Barbara McGillivray and Alessandro Vatri with sponorship and funding by the Alan Turing Institute. The original xml files are collectively available at https://www.doi.org/10.6084/m9.f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Bilby, Mark
Format:	Dataset
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Duckdb Compilation of the Diorisis Ancient Greek corpus Description The Diorsis Ancient Greek Corpus was created by Barbara McGillivray and Alessandro Vatri with sponorship and funding by the Alan Turing Institute. The original xml files are collectively available at https://www.doi.org/10.6084/m9.figshare.6187256. An article introducing the corpus is available as: Vatri, A., & McGillivray, B. (2018). The Diorisis Ancient Greek Corpus: Linguistics and Literature. Research Data Journal for the Humanities and Social Sciences, 3(1), 55-65. https://doi.org/10.1163/24523666-01000013 As the description states, the Diorisis corpus consists of "820 texts spanning between the beginnings of the AG literary tradition (Homer) and the fifth century AD, and it counts 10,206,421 words". Rights and Permissions The original Diorisis corpus is archived under a CC BY 4.0 international license. The Diorisis duckdb database of the corpus, archived here for the first time, was built and compiled by Mark G. Bilby and is here archived under a CC BY-NC-ND 4.0 license. This license allows for anyone to download, use, and modify the duckdb database robustly for analysis/queries, but not to distribute a derivative database or use the database or derivatives of it as part of a commercial product or offering. Any other rights/permissions requests or clarifications can be sent to Mark. Database Structure The database contains two tables, "document" and "word". The table structures are as follows: TABLE document comb_tlg_id VARCHAR PRIMARY KEY [TLG id conformed to GlauX format] author VARCHAR [work author] title VARCHAR [work title] genre VARCHAR [work genre] subgenre VARCHAR [work subgenre] date_created VARCHAR [work date created] sent_count INT [sentence count] word_count INT [word count] punct_count INT [punctuation count] location VARCHAR [location of composition] glaux BOOLEAN [TRUE/FALSE document also in current GlauX corpus] TABLE word word_key VARCHAR PRIMARY KEY [word unique id] comb_tlg_id VARCHAR [FOREIGN KEY, TLG id conformed to GlauX format] sent_id VARCHAR [document sentence id] seq_id VARCHAR [document word id] self_word_id VARCHAR [sentence word id] self_form VARCHAR [word form] self_lemma_id VARCHAR [word lemma id] self_lemma VARCHAR [word lemma] self_pos VARCHAR [word part of speech] self_person VARCHAR [word person] self_number VARCHAR [word number] self_tense VARCHAR [word tense] self_mood VARCHAR [word mood] self_voice VARCHAR [word voice] self_gender VARCHAR [word
DOI:	10.5281/zenodo.11261145