Gender classification in classical fiction: A computational analysis of 1113 fictions

Recent decades have witnessed the rapid development of literary studies on gender and writing style. One of the common limitations of previous studies is that they analyze only a few texts, which some researchers have already pointed out. In this study, we attempt to find the features that best faci...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical Biosciences and Engineering 2022-09, Vol.19 (9), p.8892-8907
Hauptverfasser: Zhu, Dan, Yang, Liru, Liang, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent decades have witnessed the rapid development of literary studies on gender and writing style. One of the common limitations of previous studies is that they analyze only a few texts, which some researchers have already pointed out. In this study, we attempt to find the features that best facilitate the classification of texts by authorial gender. Based on a corpus of 1113 classical fictions from the early 19 th century to the early 20 th century. Eight algorithms, including SVM, random forest, decision tree, AdaBoost, logistic regression, K-nearest neighbors, gradient boosting and XGBoost, are used to automatically select the features that are most useful for properly categorizing a text. We find that word frequency is the most important predictor for identifying authorial gender in classical fictions, achieving an accuracy rate of 92%. We also find that nationhood is not particularly impactful when dealing with authorial gender differences in classical fictions, as genderlectal variation is 'universal' in the English-speaking world.
ISSN:1551-0018
1551-0018
DOI:10.3934/mbe.2022412