SDDS-Net: Space and Depth Encoder-Decoder Convolutional Neural Networks for Real-Time Semantic Segmentation

In this paper, we propose novel convolutional encoder-decoder architectures for real-time semantic segmentation based on an image-to-image translation approach via the space-to-depth and depth-to-space modules. We present architectures that compress the spatial information of the image using the spa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.119362-119372
Hauptverfasser: Ibrahem, Hatem, Salem, Ahmed, Kang, Hyun-Soo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we propose novel convolutional encoder-decoder architectures for real-time semantic segmentation based on an image-to-image translation approach via the space-to-depth and depth-to-space modules. We present architectures that compress the spatial information of the image using the space-to-depth (SD) instead of the commonly used pooling methods (Max-pooling and Average-pooling) or strided convolution approaches. The SD module can reduce the image size while preserving the spatial information of the image in the form of extra depth information, this approach is much better than the pooling approaches which introduce a loss in the information and the details of the image. We also propose a lightweight and simple decoder stage using the depth-to-space (DS) module which constructs a high-resolution dense prediction map from a large number of low-resolution feature maps. The proposed architectures are efficient in learning image classification and semantic segmentation with high accuracy and average processing speed. We trained and tested our proposed architectures on image classification (i.e. CIFAR10 and Tiny ImageNet), and indoor and outdoor benchmarks for semantic segmentation specifically NYU-depthV2 and CITYSCAPES. The proposed architectures could attain high accuracy in classification (94.28% on CIFAR10 and 72.25% on Tiny ImageNet) and high mean average precision and pixel accuracy values in semantic segmentation (pixel accuracy of 78.55% on NYU-depthV2 and 87.9% on CITYSCAPES) while maintaining a real-time speed of frame processing outperforming recent state-of-the-art methods in semantic segmentation.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3327323