Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training
Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack o...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Monocular 3D object detection (Mono3D) has achieved unprecedented success
with the advent of deep learning techniques and emerging large-scale autonomous
driving datasets. However, drastic performance degradation remains an
unwell-studied challenge for practical cross-domain deployment as the lack of
labels on the target domain. In this paper, we first comprehensively
investigate the significant underlying factor of the domain gap in Mono3D,
where the critical observation is a depth-shift issue caused by the geometric
misalignment of domains. Then, we propose STMono3D, a new self-teaching
framework for unsupervised domain adaptation on Mono3D. To mitigate the
depth-shift, we introduce the geometry-aligned multi-scale training strategy to
disentangle the camera parameters and guarantee the geometry consistency of
domains. Based on this, we develop a teacher-student paradigm to generate
adaptive pseudo labels on the target domain. Benefiting from the end-to-end
framework that provides richer information of the pseudo labels, we propose the
quality-aware supervision strategy to take instance-level pseudo confidences
into account and improve the effectiveness of the target-domain training
process. Moreover, the positive focusing training strategy and dynamic
threshold are proposed to handle tremendous FN and FP pseudo samples. STMono3D
achieves remarkable performance on all evaluated datasets and even surpasses
fully supervised results on the KITTI 3D object detection dataset. To the best
of our knowledge, this is the first study to explore effective UDA methods for
Mono3D. |
---|---|
DOI: | 10.48550/arxiv.2204.11590 |