Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

Representation learning for video is increasingly gaining attention in the field of computer vision. For instance, video prediction models enable activity and scene forecasting or vision-based planning and control. In this article, we investigate the combination of differentiable physics and spatial...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer vision 2022, Vol.130 (1), p.3-16
Hauptverfasser: Kandukuri, Rama Krishna, Achterhold, Jan, Moeller, Michael, Stueckler, Joerg
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Representation learning for video is increasingly gaining attention in the field of computer vision. For instance, video prediction models enable activity and scene forecasting or vision-based planning and control. In this article, we investigate the combination of differentiable physics and spatial transformers in a deep action conditional video representation network. By this combination our model learns a physically interpretable latent representation and can identify physical parameters. We propose supervised and self-supervised learning methods for our architecture. In experiments, we consider simulated scenarios with pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences. We evaluate the accuracy of our training methods, and demonstrate the ability of our method to predict future video frames from input images and actions.
ISSN:0920-5691
1573-1405
DOI:10.1007/s11263-021-01493-5