Enhancing collaborative road scene reconstruction with unsupervised domain alignment

Scene reconstruction and visual localization in dynamic environments such as street scenes are a challenge due to the lack of distinctive, stable keypoints. While learned convolutional features have proven to be robust to changes in viewing conditions, handcrafted features still have advantages in d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine vision and applications 2021, Vol.32 (1), Article 13
Hauptverfasser: Venator, Moritz, Aklanoglu, Selcuk, Bruns, Erich, Maier, Andreas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Scene reconstruction and visual localization in dynamic environments such as street scenes are a challenge due to the lack of distinctive, stable keypoints. While learned convolutional features have proven to be robust to changes in viewing conditions, handcrafted features still have advantages in distinctiveness and accuracy when applied to structure from motion. For collaborative reconstruction of road sections by a car fleet, we propose to use multimodal domain adaptation as a preprocessing step to align images in their appearance and enhance keypoint matching across viewing conditions while preserving the advantages of handcrafted features. Training a generative adversarial network for translations between different illumination and weather conditions, we evaluate qualitative and quantitative aspects of domain adaptation and its impact on feature correspondences. Combined with a multi-feature discriminator, the model is optimized for synthesis of images which do not only improve feature matching but also exhibit a high visual quality. Experiments with a challenging multi-domain dataset recorded in various road scenes on multiple test drives show that our approach outperforms other traditional and learning-based methods by improving completeness or accuracy of structure from motion with multimodal two-domain image collections in eight out of ten test scenes.
ISSN:0932-8092
1432-1769
DOI:10.1007/s00138-020-01144-8