PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection

Existing 3D object detection frameworks in sensor-based applications heavily rely on large-scale annotated data to achieve optimal performance. However, obtaining such annotations from sensor data-like LiDAR or image sensors-is both time-consuming and costly. Semi-supervised learning offers an effic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2024-10, Vol.24 (21), p.6940
Hauptverfasser:	Li, Bohao, Song, Shaojing, Ai, Luxia
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy adaptive threshold Algorithms Artificial intelligence Equipment and supplies Image processing Learning Localization Methods multi-feature fusion Optical radar point enrichment pseudo-label Remote sensing Semantics semi-supervised learning Sensors Teachers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Existing 3D object detection frameworks in sensor-based applications heavily rely on large-scale annotated data to achieve optimal performance. However, obtaining such annotations from sensor data-like LiDAR or image sensors-is both time-consuming and costly. Semi-supervised learning offers an efficient solution to this challenge and holds significant potential for sensor-driven artificial intelligence (AI) applications. While it reduces the need for labeled data, semi-supervised learning still depends on a small amount of labeled samples for training. In the initial stages, relying on such limited samples can adversely affect the effective training of student-teacher networks. In this paper, we propose PE-MCAT, a semi-supervised 3D object detection method that generates high-precision pseudo-labels. First, to address the challenges of insufficient local feature capture and poor robustness in point cloud data, we introduce a point enrichment module. This module incorporates information from image sensors and combines multiple feature fusion methods of local and self-features to directly enhance the quality of point clouds and pseudo-labels, compensating for the limitations posed by using only a few labeled samples. Second, we explore the relationship between the teacher network and the pseudo-labels it generates. We propose a multi-class adaptive threshold strategy to initially filter and create a high-quality pseudo-label set. Furthermore, a joint variable threshold strategy is introduced to refine this set further, enhancing the selection of superior pseudo-labels.Extensive experiments demonstrate that PE-MCAT consistently outperforms recent state-of-the-art methods across different datasets. Specifically, on the KITTI dataset and using only 2% of labeled samples, our method improved the mean Average Precision (mAP) by 0.7% for cars, 3.7% for pedestrians, and 3.0% for cyclists.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s24216940