Deep Online Video Stabilization With Multi-Grid Warping Transformation Learning
Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D-, and 3D-based stabilization techniques have been presented previously, but to the best of our knowledge, no solutions based on deep neural networks had been proposed to da...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on image processing 2019-05, Vol.28 (5), p.2283-2292 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D-, and 3D-based stabilization techniques have been presented previously, but to the best of our knowledge, no solutions based on deep neural networks had been proposed to date. The main reason for this omission is shortage in training data as well as the challenge of modeling the problem using neural networks. In this paper, we present a video stabilization technique using a convolutional neural network. Previous works usually propose an off-line algorithm that smoothes a holistic camera path based on feature matching. Instead, we focus on low-latency, real-time camera path smoothing that does not explicitly represent the camera path and does not use future frames. Our neural network model, called StabNet, learns a set of mesh-grid transformations progressively for each input frame from the previous set of stabilized camera frames and creates stable corresponding latent camera paths implicitly. To train the network, we collect a dataset of synchronized steady and unsteady video pairs via a specially designed hand-held hardware. Experimental results show that our proposed online method performs comparatively to the traditional off-line video stabilization methods without using future frames while running about 10 times faster. More importantly, our proposed StabNet is able to handle low-quality videos, such as night-scene videos, watermarked videos, blurry videos, and noisy videos, where the existing methods fail in feature extraction or matching. |
---|---|
ISSN: | 1057-7149 1941-0042 |
DOI: | 10.1109/TIP.2018.2884280 |