Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic survei...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of photogrammetry, remote sensing and geoinformation science remote sensing and geoinformation science, 2024-10, Vol.92 (5), p.499-516
Hauptverfasser:	El Amrani Abouelassad, S., Mehltretter, M., Rottensteiner, F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Aerospace Technology and Astronautics Astronomy Computer Imaging Earth and Environmental Science Geographical Information Systems/Cartography Geography Observations and Techniques Original Article Pattern Recognition and Graphics Remote Sensing/Photogrammetry Signal,Image and Speech Processing Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of  cm in planimetry and  cm in height for keypoints defining the car shape.
ISSN:	2512-2789 2512-2819
DOI:	10.1007/s41064-024-00311-0