MreNet: A Vision Transformer Network for Estimating Room Layouts from a Single RGB Panorama

The major problem with 3D room layout reconstruction is estimating the 3D room layout from a single panoramic image. In practice, the boundaries between indoor objects are difficult to define, for example, the boundary position of a sofa and a table, and the boundary position of a picture frame and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2022-10, Vol.12 (19), p.9696
Hauptverfasser: Xu, Bing, Sun, Yaohui, Meng, Xiangxu, Liu, Zhihan, Li, Wei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The major problem with 3D room layout reconstruction is estimating the 3D room layout from a single panoramic image. In practice, the boundaries between indoor objects are difficult to define, for example, the boundary position of a sofa and a table, and the boundary position of a picture frame and a wall. We propose MreNet, a novel neural network architecture for predicting 3D room layout, which outperforms previous state-of-the-art approaches. It can efficiently model the overall layout of indoor rooms through a global receptive field and sparse attention mechanism, while prior works tended to use CNNs to gradually increase the receptive field. Furthermore, the proposed feature connection mechanism can solve the problem of the gradient disappearing in the process of training, and feature maps of different granularity can be obtained in different layers. Experiments on both cuboid-shaped and general Manhattan layouts show that the proposed work outperforms recent algorithms in prediction accuracy.
ISSN:2076-3417
2076-3417
DOI:10.3390/app12199696