Extraction of Authors' Characteristics from Japanese Modern Sentences via N-gram Distribution

Objects of many studies of authorship attribution have been text data in which boundaries between words are obvious [1] [2]. When we apply these studies to languages in which sentences could not be divided obviously into words, such as Japanese or Chinese, preliminary processing of text data such as...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	MATSUURA, Tsukasa, KANADA, Yasumasa
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Average Accuracy Computers in experimental physics Exact sciences and technology Instruments, apparatus, components and techniques common to several branches of physics and astronomy Natural Language Processing Origin Text Physics Preliminary Processing Text Data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objects of many studies of authorship attribution have been text data in which boundaries between words are obvious [1] [2]. When we apply these studies to languages in which sentences could not be divided obviously into words, such as Japanese or Chinese, preliminary processing of text data such as morphological analysis is required and may influence the final results. The methods which make use of characteristics of particular languages or particular compositions also have limited coverage [3]. Extracting authors’ characteristics from sentences is generally an unsolved problem. Therefore, we propose a method for authorship attribution based on distribution of n-grams of characters in sentences. The proposed method can analyze sentences without any additional information, i.e. preliminary analyses. The experiments, where 3-grams to represent author’s characteristics were educed on the basis of their distributions, are also reported in the following.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/3-540-44418-1_38