A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation
Multi-scale architecture, including hierarchical vision transformer, has been commonly applied to high-resolution semantic segmentation to deal with computational complexity with minimum performance loss. In this paper, we propose a novel decoding scheme for semantic segmentation in this regard, whi...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-scale architecture, including hierarchical vision transformer, has been
commonly applied to high-resolution semantic segmentation to deal with
computational complexity with minimum performance loss. In this paper, we
propose a novel decoding scheme for semantic segmentation in this regard, which
takes multi-level features from the encoder with multi-scale architecture. The
decoding scheme based on a multi-level vision transformer aims to achieve not
only reduced computational expense but also higher segmentation accuracy, by
introducing successive cross-attention in aggregation of the multi-level
features. Furthermore, a way to enhance the multi-level features by the
aggregated semantics is proposed. The effort is focused on maintaining the
contextual consistency from the perspective of attention allocation and brings
improved performance with significantly lower computational cost. Set of
experiments on popular datasets demonstrates superiority of the proposed scheme
to the state-of-the-art semantic segmentation models in terms of computational
cost without loss of accuracy, and extensive ablation studies prove the
effectiveness of ideas proposed. |
---|---|
DOI: | 10.48550/arxiv.2402.11201 |