CSNet: Cross-Stage Subtraction Network for Real-Time Semantic Segmentation in Autonomous Driving
Learning multi-scale feature representations is essential for dense prediction tasks in autonomous driving. Most existing works are based on U-shaped architectures, where high-resolution representations are progressively recovered by connecting different levels of the decoder with low-resolution rep...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2024-12, p.1-16 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning multi-scale feature representations is essential for dense prediction tasks in autonomous driving. Most existing works are based on U-shaped architectures, where high-resolution representations are progressively recovered by connecting different levels of the decoder with low-resolution representations from the encoder. We observed that rich details from low-level representation and high semantic information from high-level representations are not fully utilized in the cross-stage fusion process. Additionally, current architectures often struggle to extract efficient discriminative feature along object boundaries. To address this issue, we propose CSNet, a generic cross-stage subtraction network that extracts spatial and semantic multi-scale representations through guided contextual feature. This approach allows fine-grained features to refine deeper layers, capturing discriminative high-resolution features while filtering out redundant information. Specifically, we introduce a cross-stage subtraction module (CSM), which consists of three sub-modules: 1) a Short Path Unit, focusing on capturing complementary adjacent information; 2) Medium Path Unit for effective middle-stages features aggregation; and 3) Long Path Unit for redundant information masking and long-range context modeling. Additionally, we propose the Semantic Guided Context Reasoning (SGCR) module to reason and model contextual relations between different subtraction units. CSNet demonstrates consistent performance gains across various semantic segmentation datasets. Our model, CSNet-M, achieves 82.2% mIoU on the Camvid dataset, while CSNet-S and CSNet-M attain 79.6% and 80.5% mIoU accuracy, respectively, on the Cityscapes dataset. These results show that the proposed CSNet has the potential for enhancing real-time semantic segmentation in autonomous driving applications, offering improved accuracy and efficiency in diverse urban scenarios. The source code for this work will be published at https://github.com/mohamedac29/CSNet. |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2024.3519162 |