A Framework for Leveraging Interimage Information in Stereo Images for Enhanced Semantic Segmentation in Autonomous Driving

Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods primarily focus on monocular images, neglecting the untapped potential of stereo cameras commonly equipped in autonomous vehicles an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on instrumentation and measurement 2023, Vol.72, p.1-12
Hauptverfasser: Sun, Libo, Bockman, James, Sun, Changming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods primarily focus on monocular images, neglecting the untapped potential of stereo cameras commonly equipped in autonomous vehicles and robots, which capture binocular images. In this article, we aim to introduce an innovative stereo-vision-based semantic segmentation framework that maximizes the utilization of stereo image data to enhance segmentation performance. Unlike conventional monocular approaches that only use one image, our method effectively uses both the images, exploiting interimage correspondences and harnessing previously neglected information. Our core innovations encompass label generation for right images, combined with stereo-vision-based information fusion. For label generation, we propose a novel technique to accurately generate labels for the right images in stereo pairs, even in scenarios with no direct annotations. This innovative approach empowers our models to effectively learn from a complete stereo dataset, enhancing their semantic segmentation capabilities. In addition, our stereo-vision-based information fusion framework seamlessly integrates features and spatial disparities from the binocular images, enabling our models to produce more accurate and contextually enriched semantic segmentation outputs. To validate the efficacy of our proposed approach, we conduct comprehensive experiments on the Cityscapes and KITTI datasets using diverse, well-known semantic segmentation architectures. The results unequivocally demonstrate the superiority and effectiveness of our method.
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2023.3328708