RAFT-MSF: Self-Supervised Monocular Scene Flow Using Recurrent Optimizer

A popular approach to estimate scene flow is to utilize point cloud data from various Lidar scans. However, there is little attention to learning 3D motion from camera images. Learning scene flow from a monocular camera remains a challenging task due to its ill-posedness as well as lack of annotated...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer vision 2023-11, Vol.131 (11), p.2757-2769
Hauptverfasser: Bayramli, Bayram, Hur, Junhwa, Lu, Hongtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A popular approach to estimate scene flow is to utilize point cloud data from various Lidar scans. However, there is little attention to learning 3D motion from camera images. Learning scene flow from a monocular camera remains a challenging task due to its ill-posedness as well as lack of annotated data. Self-supervised methods demonstrate learning scene flow estimation from unlabeled data, yet their accuracy lags behind (semi-)supervised methods. In this paper, we introduce a self-supervised monocular scene flow method that substantially improves the accuracy over the previous approaches. Based on RAFT, a state-of-the-art optical flow model, we design a new decoder to iteratively update 3D motion fields and disparity maps simultaneously. Furthermore, we propose an enhanced upsampling layer and a disparity initialization technique, which overall further improves accuracy up to 7.2%. Our method achieves state-of-the-art accuracy among all self-supervised monocular scene flow methods, improving accuracy by 34.2%. Our fine-tuned model outperforms the best previous semi-supervised method with 228 times faster runtime. Code will be publicly available to ensure reproducibility.
ISSN:0920-5691
1573-1405
DOI:10.1007/s11263-023-01828-4