Extraction of Authors' Characteristics from Japanese Modern Sentences via N-gram Distribution
Objects of many studies of authorship attribution have been text data in which boundaries between words are obvious [1] [2]. When we apply these studies to languages in which sentences could not be divided obviously into words, such as Japanese or Chinese, preliminary processing of text data such as...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Objects of many studies of authorship attribution have been text data in which boundaries between words are obvious [1] [2]. When we apply these studies to languages in which sentences could not be divided obviously into words, such as Japanese or Chinese, preliminary processing of text data such as morphological analysis is required and may influence the final results. The methods which make use of characteristics of particular languages or particular compositions also have limited coverage [3]. Extracting authors’ characteristics from sentences is generally an unsolved problem. Therefore, we propose a method for authorship attribution based on distribution of n-grams of characters in sentences. The proposed method can analyze sentences without any additional information, i.e. preliminary analyses. The experiments, where 3-grams to represent author’s characteristics were educed on the basis of their distributions, are also reported in the following. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/3-540-44418-1_38 |