SSNet: a joint learning network for semantic segmentation and disparity estimation: SSNet: a joint learning network

Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win–win approach Stereo Semantic Netw...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Visual computer 2025, Vol.41 (1), p.423-435
Hauptverfasser:	Jia, Dayu, Pang, Yanwei, Cao, Jiale, Jing, Pan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer Graphics Computer Science Image Processing and Computer Vision Original Article
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win–win approach Stereo Semantic Network (SSNet) is proposed for pixel-wise scene parsing. SSNet is the first Transformer based end-to-end joint learning model for semantic segmentation and disparity estimation. The main novelty lies in the proposed Transformer Feature Separation Module (TFSM) which is designed to separate features for segmentation prediction and disparity regression according to the characteristics of the two tasks. The segmentation and disparity results are supervised jointly with a weighted summation loss function to improve the performance of both tasks. Experimental results on Cityscapes Dataset and KITTI 2015 Dataset demonstrate that SSNet outperforms state-of-the-art joint learning approaches.
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-024-03336-z