An end-to-end network for irregular printed Mongolian recognition

Mongolian is a language spoken in Inner Mongolia, China. In the recognition process, due to the shooting angle and other reasons, the image and text will be deformed, which will cause certain difficulties in recognition. This paper propose a triplet attention Mogrifier network (TAMN) for print Mongo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal on document analysis and recognition 2022-03, Vol.25 (1), p.41-50
Hauptverfasser: Cui, ShaoDong, Su, YiLa, Qing dao er ji, Ren, Ji, YaTu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mongolian is a language spoken in Inner Mongolia, China. In the recognition process, due to the shooting angle and other reasons, the image and text will be deformed, which will cause certain difficulties in recognition. This paper propose a triplet attention Mogrifier network (TAMN) for print Mongolian text recognition. The network uses a spatial transformation network to correct deformed Mongolian images. It uses gated recurrent convolution layers (GRCL) combine with triplet attention module to extract image features for the corrected images. The Mogrifier long short-term memory (LSTM) network gets the context sequence information in the feature and finally uses the decoder’s LSTM attention to get the prediction result. Experimental results show the spatial transformation network can effectively recognize deformed Mongolian images, and the recognition accuracy can reach 90.30%. This network achieves good performance in Mongolian text recognition compare with the current mainstream text recognition network. The dataset has been publicly available at https://github.com/ShaoDonCui/Mongolian-recognition .
ISSN:1433-2833
1433-2825
DOI:10.1007/s10032-021-00388-y