Scene Text Segmentation via Multi-Task Cascade Transformer with Paired Data Synthesis

The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Dang, Quang-Vinh, Lee, Guee-Sang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ paired data generation to secure sufficient data samples for text segmentation via Text Image-conditional GANs. Furthermore, existing models implicitly model text attributes such as size, layout, font, and structure, which hinders their performance. To remedy this, we propose a Multi-task Cascade Transformer network that explicitly learns these attributes using large volumes of generated synthetic data. The transformer-based network includes two auxiliary tasks and one main task for text segmentation. The auxiliary tasks help the network learn text regions to focus on, as well as the structure of the text through different words and fonts, to support the main task. To bridge the gap between different datasets, we train the proposed network on paired synthetic data before fine-tuning it on real data. Our experiments on publicly available scene text segmentation datasets show that our method outperforms existing methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3292264