Video Frame Interpolation with Flow Transformer
Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer c...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video frame interpolation has been actively studied with the development of
convolutional neural networks. However, due to the intrinsic limitations of
kernel weight sharing in convolution, the interpolated frame generated by it
may lose details. In contrast, the attention mechanism in Transformer can
better distinguish the contribution of each pixel, and it can also capture
long-range pixel dependencies, which provides great potential for video
interpolation. Nevertheless, the original Transformer is commonly used for 2D
images; how to develop a Transformer-based framework with consideration of
temporal self-attention for video frame interpolation remains an open issue. In
this paper, we propose Video Frame Interpolation Flow Transformer to
incorporate motion dynamics from optical flows into the self-attention
mechanism. Specifically, we design a Flow Transformer Block that calculates the
temporal self-attention in a matched local area with the guidance of flow,
making our framework suitable for interpolating frames with large motion while
maintaining reasonably low complexity. In addition, we construct a multi-scale
architecture to account for multi-scale motion, further improving the overall
performance. Extensive experiments on three benchmarks demonstrate that the
proposed method can generate interpolated frames with better visual quality
than state-of-the-art methods. |
---|---|
DOI: | 10.48550/arxiv.2307.16144 |