A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Specifically, we introduce a Multi-view Disparity Attention (MD...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we propose a novel multi-view stereo (MVS) framework that gets
rid of the depth range prior. Unlike recent prior-free MVS methods that work in
a pair-wise manner, our method simultaneously considers all the source images.
Specifically, we introduce a Multi-view Disparity Attention (MDA) module to
aggregate long-range context information within and across multi-view images.
Considering the asymmetry of the epipolar disparity flow, the key to our method
lies in accurately modeling multi-view geometric constraints. We integrate pose
embedding to encapsulate information such as multi-view camera poses, providing
implicit geometric constraints for multi-view disparity feature fusion
dominated by attention. Additionally, we construct corresponding hidden states
for each source image due to significant differences in the observation quality
of the same pixel in the reference frame across multiple source frames. We
explicitly estimate the quality of the current pixel corresponding to sampled
points on the epipolar line of the source image and dynamically update hidden
states through the uncertainty estimation module. Extensive results on the DTU
dataset and Tanks&Temple benchmark demonstrate the effectiveness of our method.
The code is available at our project page:
https://zju3dv.github.io/GD-PoseMVS/. |
---|---|
DOI: | 10.48550/arxiv.2411.01893 |