Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation
The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic losses. Proxy tasks can overcome these issues, and start by training a network for a task f...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The difficulty of annotating training data is a major obstacle to using CNNs
for low-level tasks in video. Synthetic data often does not generalize to real
videos, while unsupervised methods require heuristic losses. Proxy tasks can
overcome these issues, and start by training a network for a task for which
annotation is easier or which can be trained unsupervised. The trained network
is then fine-tuned for the original task using small amounts of ground truth
data. Here, we investigate frame interpolation as a proxy task for optical
flow. Using real movies, we train a CNN unsupervised for temporal
interpolation. Such a network implicitly estimates motion, but cannot handle
untextured regions. By fine-tuning on small amounts of ground truth flow, the
network can learn to fill in homogeneous regions and compute full optical flow
fields. Using this unsupervised pre-training, our network outperforms similar
architectures that were trained supervised using synthetic optical flow. |
---|---|
DOI: | 10.48550/arxiv.1809.08317 |