OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene Text Detection

Scene text detection methods have achieved significant progresses. However, stack-omnidirectional text dilemma, under-segmentation of very close text words, and over-segmentation of arbitrary-shape long text lines, are still main challenges. Motivated by these problems, we proposed a two stage metho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2021, Vol.23, p.454-467
Hauptverfasser: Zhang, Sheng, Liu, Yuliang, Jin, Lianwen, Wei, Zhongrong, Shen, Chunhua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Scene text detection methods have achieved significant progresses. However, stack-omnidirectional text dilemma, under-segmentation of very close text words, and over-segmentation of arbitrary-shape long text lines, are still main challenges. Motivated by these problems, we proposed a two stage method called omnidirectional pyramid mask proposal text detector (OPMP). OPMP removes anchor mechanism that requires heuristic non-maximum suppress processing. Instead, it uses an effective pyramid lengthwise and sidewise residual sequence modeling method to produce arbitrary-shape proposals. To accurately extract the features of text shape, OPMP enhances the backbone layers by a multiple arbitrary-shape fitting mechanism. Finally, a multi-grain text classification module is proposed, which reclassifies each text region robustly. Comprehensive ablation studies demonstrate the effectiveness of each proposed component. In addition, experiments on various benchmarks, including ICDAR2015, MLT, MSRA-TD500, CTW1500, and Total-text, show that our method outperforms previous state-of-the-art methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2020.2978630