SSNet: a joint learning network for semantic segmentation and disparity estimation: SSNet: a joint learning network
Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win–win approach Stereo Semantic Netw...
Gespeichert in:
Veröffentlicht in: | The Visual computer 2025, Vol.41 (1), p.423-435 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win–win approach Stereo Semantic Network (SSNet) is proposed for pixel-wise scene parsing. SSNet is the first Transformer based end-to-end joint learning model for semantic segmentation and disparity estimation. The main novelty lies in the proposed Transformer Feature Separation Module (TFSM) which is designed to separate features for segmentation prediction and disparity regression according to the characteristics of the two tasks. The segmentation and disparity results are supervised jointly with a weighted summation loss function to improve the performance of both tasks. Experimental results on Cityscapes Dataset and KITTI 2015 Dataset demonstrate that SSNet outperforms state-of-the-art joint learning approaches. |
---|---|
ISSN: | 0178-2789 1432-2315 |
DOI: | 10.1007/s00371-024-03336-z |