Writer identification based on letter frequency distribution

Lately writer identification problem has become actual due to huge amount of documents in digital form. In the current work an approach based on frequency combination of letters is investigated for solving such a task as classification of documents by authorship. This research examines and compares...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Diurdeva, Polina, Mikhailova, Elena, Shalymov, Dmitry
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Lately writer identification problem has become actual due to huge amount of documents in digital form. In the current work an approach based on frequency combination of letters is investigated for solving such a task as classification of documents by authorship. This research examines and compares four different distance measures between a text of unknown authorship and an authors' profile: L 1 measure, Kullback-Leibler divergence, base metric of Common N-gram method (OVG)[8] and certain variation of dissimilarity measure of CNG method which was proposed in [12]. Comparison outlines cases when some metric outperforms others with a specific parameter combination. Experiments are conducted on different Russian and English corpora.
ISSN:2305-7254
2305-7254
2343-0737
DOI:10.23919/FRUCT.2016.7892179