Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation

Semantic segmentation plays a critical role in scene understanding for self-driving vehicles. A line of efforts has proven that global context matters in urban scene segmentation due to massive scale changes. However, we find that existing methods suffer from local ambiguities when dissipating conti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2022-10, Vol.23 (10), p.19224-19235
Hauptverfasser:	Tang, Quan, Liu, Fagui, Zhang, Tong, Jiang, Jun, Zhang, Yu, Zhu, Boyuan, Tang, Xuhao
Format:	Artikel
Sprache:	eng
Schlagworte:	Agglomeration Autonomous cars autonomous driving Coders Computational modeling Context Context continuity Convolution Convolutional neural networks encoder-decoder Encoders-Decoders gate attention Image segmentation Logic gates Scene analysis Scrambling (communication) Semantic segmentation Semantics Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic segmentation plays a critical role in scene understanding for self-driving vehicles. A line of efforts has proven that global context matters in urban scene segmentation due to massive scale changes. However, we find that existing methods suffer from local ambiguities when dissipating continuous local context, i.e. scrambling to a huge receptive field of global cues by coarse pooling. To this end, this paper proposes a new Context Aggregation Module (CAM) that consists of two primary components: context encoding using no coarse pooling but encoder-decoders with appropriate sampling scales and gated fusion that extends gate attention mechanism to balance different-scale context during feature fusion. Weeding out coarse pooling and applying the encoder-decoder inherits the merits of exploring global context while avoiding the drawback of losing local contextual continuity. We then construct a Context Aggregation Network (CANet) and conduct extensive evaluations on challenging autonomous driving benchmarks of Cityscapes, CamVid and BDD100K. Consistently improved results evidence the effectiveness. Notably, we attain competitive mIoU 82.7% on Cityscapes and optimal mIoU 80.5% on CamVid.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2022.3157128