UFVL-Net: A Unified Framework for Visual Localization Across Multiple Indoor Scenes

Recently, scene coordinate regression (SCoRe) approaches for visual localization have been extensively investigated. However, current SCoRe methods are scene-specific and necessitate retraining when generalizing new scenarios, leaving a consistent rise in model capacity as the number of scenes incre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on instrumentation and measurement 2023, Vol.72, p.1-16
Hauptverfasser: Xie, Tao, Jiang, Zhiqiang, Li, Shuozhan, Zhang, Yukun, Dai, Kun, Wang, Ke, Li, Ruifeng, Zhao, Lijun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, scene coordinate regression (SCoRe) approaches for visual localization have been extensively investigated. However, current SCoRe methods are scene-specific and necessitate retraining when generalizing new scenarios, leaving a consistent rise in model capacity as the number of scenes increases. To this end, we develop UFVL-Net, a unifying framework that integrates localization tasks of multiple indoor scenarios into a manageable network and optimizes these tasks collectively under diversified scene domains, where the localization of each scenario domain is considered a separate task. UFVL-Net is storage-efficient since multiple models with shared parameters can be consolidated into a single one. Specifically, we introduce two parameter sharing policies, that is, channel-wise sharing policy (CSP) and kernel-wise sharing policy, which offer fine-grained parameter sharing within each layer of the backbone for efficient storage while providing task-specific parameters to tackle the inherent hurdles associated with multidomain learning for visual localization, that is, gradient conflict due to a skewed competition among tasks for the shared parameters. The key insight lies in that leveraging task-sharing parameters can learn a generic feature representation across scenes while utilizing task-specific parameters can learn task-related features for alleviating gradient conflict. Moreover, we develop a sign-based gradient normalization (SIGGrad) technique applied to task-sharing parameters to promote the training of UFVL-Net by further mitigating gradient conflict, thus emphasizing the utilization of task-sharing parameters and ensuring that each task is thoroughly optimized. We undertake extensive experiments across numerous datasets and complex real-world scenarios, showing that UFVL-Net families significantly outperform the cutting-edge methods with much less storage space. We demonstrate that UFVL-Net can be generalized to new scenarios using a few task-specific parameters, further highlighting the superiority of UFVL-Net. The code is available at here.
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2023.3315406