MUSTER: A Multi-scale Transformer-based Decoder for Semantic Segmentation
In recent works on semantic segmentation, there has been a significant focus on designing and integrating transformer-based encoders. However, less attention has been given to transformer-based decoders. We emphasize that the decoder stage is equally vital as the encoder in achieving superior segmen...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent works on semantic segmentation, there has been a significant focus
on designing and integrating transformer-based encoders. However, less
attention has been given to transformer-based decoders. We emphasize that the
decoder stage is equally vital as the encoder in achieving superior
segmentation performance. It disentangles and refines high-level cues, enabling
precise object boundary delineation at the pixel level. In this paper, we
introduce a novel transformer-based decoder called MUSTER, which seamlessly
integrates with hierarchical encoders and consistently delivers high-quality
segmentation results, regardless of the encoder architecture. Furthermore, we
present a variant of MUSTER that reduces FLOPS while maintaining performance.
MUSTER incorporates carefully designed multi-head skip attention (MSKA) units
and introduces innovative upsampling operations. The MSKA units enable the
fusion of multi-scale features from the encoder and decoder, facilitating
comprehensive information integration. The upsampling operation leverages
encoder features to enhance object localization and surpasses traditional
upsampling methods, improving mIoU (mean Intersection over Union) by 0.4% to
3.2%. On the challenging ADE20K dataset, our best model achieves a single-scale
mIoU of 50.23 and a multi-scale mIoU of 51.88, which is on-par with the current
state-of-the-art model. Remarkably, we achieve this while significantly
reducing the number of FLOPs by 61.3%. Our source code and models are publicly
available at: https://github.com/shiwt03/MUSTER. |
---|---|
DOI: | 10.48550/arxiv.2211.13928 |