A Framework for Leveraging Interimage Information in Stereo Images for Enhanced Semantic Segmentation in Autonomous Driving
Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods primarily focus on monocular images, neglecting the untapped potential of stereo cameras commonly equipped in autonomous vehicles an...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on instrumentation and measurement 2023, Vol.72, p.1-12 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods primarily focus on monocular images, neglecting the untapped potential of stereo cameras commonly equipped in autonomous vehicles and robots, which capture binocular images. In this article, we aim to introduce an innovative stereo-vision-based semantic segmentation framework that maximizes the utilization of stereo image data to enhance segmentation performance. Unlike conventional monocular approaches that only use one image, our method effectively uses both the images, exploiting interimage correspondences and harnessing previously neglected information. Our core innovations encompass label generation for right images, combined with stereo-vision-based information fusion. For label generation, we propose a novel technique to accurately generate labels for the right images in stereo pairs, even in scenarios with no direct annotations. This innovative approach empowers our models to effectively learn from a complete stereo dataset, enhancing their semantic segmentation capabilities. In addition, our stereo-vision-based information fusion framework seamlessly integrates features and spatial disparities from the binocular images, enabling our models to produce more accurate and contextually enriched semantic segmentation outputs. To validate the efficacy of our proposed approach, we conduct comprehensive experiments on the Cityscapes and KITTI datasets using diverse, well-known semantic segmentation architectures. The results unequivocally demonstrate the superiority and effectiveness of our method. |
---|---|
ISSN: | 0018-9456 1557-9662 |
DOI: | 10.1109/TIM.2023.3328708 |