Analysis of Text Non-Homogeneity Using Markers

The aim of the paper is to assess the distributional non-homogeneity of texts in the usage of functional words andother linguistic units. Our empirical study is based on recommended school fiction works taken from a digital library athttp://ebiblioteka.mkp.emokykla.lt. Sets of frequent word forms, c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Lithuanian Journal of Statistics 2015-12, Vol.54 (1), p.92-100
Hauptverfasser:	Lapėnaitė-Gedvilė, Monika, Piaseckienė, Karolina, Radavičius, Marijus
Format:	Artikel
Sprache:	eng
Schlagworte:	binomial logistic regression deviance functional words over-dispersion statistical linguistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The aim of the paper is to assess the distributional non-homogeneity of texts in the usage of functional words andother linguistic units. Our empirical study is based on recommended school fiction works taken from a digital library athttp://ebiblioteka.mkp.emokykla.lt. Sets of frequent word forms, called markers, are made, and their frequency counts in blocks of 50successive sentences are calculated. The frequency counts of the markers show significant excess variability (overdispersion) withrespect to a text homogeneity model usually assumed in linguistics. For chosen markers, different kinds of hierarchical binomiallogistic regression models with the author's identifier, the block length and the frequency counts of the remaining markers as explanatory variables are fitted to the block data in order to explain the observed overdispersion of the markers chosen.
ISSN:	1392-642X 2029-7262
DOI:	10.15388/LJS.2015.13884