Rethinking DABNet: Light-Weight Network for Real-Time Semantic Segmentation of Road Scenes

Recent advancements in autonomous driving and mobile devices have led to the development of real-time and lightweight semantic image segmentation models. However, these algorithms readily suffer from inherent accuracy loss compared to large networks. DABNet (Li et al., 2019) presented a highly effic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on artificial intelligence 2024-06, Vol.5 (6), p.3098-3108
Hauptverfasser: Mazhar, Saquib, Atif, Nadeem, Bhuyan, M. K., Ahamed, Shaik Rafi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent advancements in autonomous driving and mobile devices have led to the development of real-time and lightweight semantic image segmentation models. However, these algorithms readily suffer from inherent accuracy loss compared to large networks. DABNet (Li et al., 2019) presented a highly efficient method to balance the accuracy-model size tradeoff. Nevertheless, the bottleneck structure and single-scale receptive field of its building block have limited performance for the given network size. To further improve the segmentation score and reduce the number of parameters, the basic block is redesigned using an inverted-residual and dilation pyramid structure (IRDP). The IRDP module can efficiently learn contextual features at multiple dilations within the block. Using the inverted-residual structure with an expansion layer prevents information loss due to the dimensionality reduction of the feature space. The IRDP block is utilized to rebuild the DABNet structure, working in real-time for resource-constrained devices. In addition, a fast and lightweight decoder-fast-lightweight decoder (FLD) is also proposed to improve the segmentation accuracy of the network. Experiments performed on Cityscapes and Cambridge-driving Labeled Video Database (CamVid) datasets demonstrate the effectiveness of the proposed approach. On Cityscapes, IRDPNet can achieve a mean Intersection-over-Union (mIOU) of 75.62%. At the same time, the lighter version gets an mIoU of 71.32% with only 0.32 million parameters, which is similar to the DABNet accuracy with half the number of parameters.
ISSN:2691-4581
2691-4581
DOI:10.1109/TAI.2023.3341976