Opening a Free Path to Analyze the Discourse Shift in the Soviet Belarusian Newspaper Zviazda after the Molotov-Ribbentrop Pact

This paper attempts to develop a pipeline designed to convert graphical PDF files of the newspaper Zviazda into usable text data in the Belarusian language with search and visualization options. Apart from punctual conversion scripts to allow navigating between formats, the pipeline relies on freely...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Open Humanities Data 2023-11, Vol.9 (3), p.23-23
1. Verfasser: Boizou, Loïc
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper attempts to develop a pipeline designed to convert graphical PDF files of the newspaper Zviazda into usable text data in the Belarusian language with search and visualization options. Apart from punctual conversion scripts to allow navigating between formats, the pipeline relies on freely available resources in order to process this relatively under-resourced language (at least for freely available resources). This pipeline was designed to include a graph database and to be compatible with data visualization tools. The ultimate goal is to develop a resource to analyze the political discourse in the Soviet Belarusian press during the Second World War. With a view to validating the pipeline, a pilot study was carried out: it aims to visualize some simple manifestations of the Soviet rhetorical shift about Nazi Germany after the signing of the Molotov-Ribbentrop Pact in order to prove that some useful phenomenon can be revealed even with quite noisy data. Keywords: NLP, Belarusian language, Graph databases, Discourse, Soviet press
ISSN:2059-481X
2059-481X
DOI:10.5334/johd.133