A Historical Handwritten French Manuscripts Text Detection Method in Full Pages

Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study propose...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information (Basel) 2024-08, Vol.15 (8), p.483
Hauptverfasser: Sang, Rui, Zhao, Shili, Meng, Yan, Zhang, Mingxian, Li, Xuefei, Xia, Huijie, Zhao, Ran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text detection method based on improved YOLOv8s. Firstly, the Swin Transformer is used to replace C2f at the end of the backbone network to solve the shortcomings of fine-grained information loss and insufficient learning features in text word detection. Secondly, the Dysample (Dynamic Upsampling Operator) method is used to retain more detailed features of the target and overcome the shortcomings of information loss in traditional upsampling to realize the text detection task for dense targets. Then, the LSK (Large Selective Kernel) module is added to the detection head to dynamically adjust the feature extraction receptive field, which solves the cases of extreme aspect ratio words, unfocused small text, and complex shape text in text detection. Finally, in order to overcome the CIOU (Complete Intersection Over Union) loss in target box regression with unclear aspect ratio, insensitive to size change, and insufficient correlation between target coordinates, Gaussian Wasserstein Distance (GWD) is introduced to modify the regression loss to measure the similarity between the two bounding boxes in order to obtain high-quality bounding boxes. Compared with the State-of-the-Art methods, the proposed method achieves optimal performance in text detection, with the precision and mAP@0.5 reaching 86.3% and 82.4%, which are 8.1% and 6.7% higher than the original method, respectively. The advancement of each module is verified by ablation experiments. The experimental results show that the method proposed in this study can effectively realize complex text detection and provide a powerful technical means for historical manuscript reproduction.
ISSN:2078-2489
DOI:10.3390/info15080483