Author identification of literary works based on text analysis and deep learning

With the development of science, speech, picture, and other analysis, problems have been gradually better solved, but the study of Chinese text has been a complex problem to overcome. Chinese text analysis requires not only statistics but also semantic comprehension analysis. Different text types ne...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Heliyon 2024-02, Vol.10 (3), p.e25464-e25464, Article e25464
1. Verfasser: Tang, Xu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of science, speech, picture, and other analysis, problems have been gradually better solved, but the study of Chinese text has been a complex problem to overcome. Chinese text analysis requires not only statistics but also semantic comprehension analysis. Different text types need other language style feature modeling to obtain good recognition results. In this study, we use the deep learning method to construct an automatic text feature extraction model and classify it with the author as a classification label. This study presents a literature author recognition model based on deep learning, which is mainly divided into three phases: text preprocessing, feature extraction, and classification. Each part consists of several small modules or steps. First, we input the corpus to Word2Vec to generate the new word vector. Then, the improved text feature extractor based on CNN and Attention extracts the text features and uses them as the input of the CNN convolution layer. After convolution, the text is combined with bits to get Window Feature Sequence. It is the text feature vector. Next, based on LSTM and Softmax classification output, Window Feature Sequence is used as the input of LSTM to obtain two one-dimensional vectors spliced by concatenate layer. Finally, the result is classified through the fully connected layer, Batch Normalization layer, and Softmax. The performance of the proposed model in recognizing authors of Chinese literature was evaluated using two datasets. In the research process, the data we collected included works of different forms, such as prose and fiction. The research results show that the proposed model can effectively identify author identity. The classification accuracy of our proposed algorithm is significantly better than that of the benchmark model.
ISSN:2405-8440
2405-8440
DOI:10.1016/j.heliyon.2024.e25464