DCUFormer: Enhancing pavement crack segmentation in complex environments with dual-cross/upsampling attention

Efficient road inspection and maintenance are essential to extend pavement lifespan and enhance safety. However, automated crack detection remains challenging due to varied environmental conditions and differences in image collection equipment, making robust algorithm development a critical need. Vi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2025-03, Vol.264, p.125891, Article 125891
Hauptverfasser:	Shan, Jinhuan, Huang, Yue, Jiang, Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature upsampling Pavement crack Semantic segmentation Vision transformer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Efficient road inspection and maintenance are essential to extend pavement lifespan and enhance safety. However, automated crack detection remains challenging due to varied environmental conditions and differences in image collection equipment, making robust algorithm development a critical need. Vision Transformers, with their capacity to capture long-range dependencies, offer significant advantages for crack detection in complex scenarios by effectively extracting global features. Nevertheless, existing Transformer-based methods encounter difficulties in boundary delineation due to decoder design limitations, which lead to suboptimal fusion of low-level and high-level features. To address this issue, we propose a comprehensive approach that integrates semantic preservation, detail refinement, and detail delineation. These concepts are realized through our novel Dual-Cross Attention Module (DCA) and Upsampling Attention Module (UA). The DCA module progressively filters redundant details from low-level feature layers using high-level semantic information, while preserving boundary details to refine high-level feature boundaries. In addition, the UA module employs progressive local cross-attention in upsampling, facilitating more precise boundary definitions and surpassing conventional dynamic upsampling methods. Our approach, utilizing both lightweight (MiT-B0, LVT) and middleweight (Swin-T) backbones, demonstrates state-of-the-art performance on three diverse datasets—Crack500, CrackSC, and UAV-Crack500—highlighting its robustness across varied conditions. This work contributes to advancing Transformer-based architectures for defect segmentation in complex engineering contexts, underscoring the critical role of improved feature fusion in crack detection. The code is available at: https://github.com/SHAN-JH/DCUFormer.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2024.125891