Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals
In this paper, several variants of two-stream architectures for temporal action proposal generation in long, untrimmed videos are presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Sin...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, several variants of two-stream architectures for temporal
action proposal generation in long, untrimmed videos are presented. Inspired by
the recent advances in the field of human action recognition utilizing 3D
convolutions in combination with two-stream networks and based on the
Single-Stream Temporal Action Proposals (SST) architecture, four different
two-stream architectures utilizing sequences of images on one stream and
sequences of images of optical flow on the other stream are subsequently
investigated. The four architectures fuse the two separate streams at different
depths in the model; for each of them, a broad range of parameters is
investigated systematically as well as an optimal parametrization is
empirically determined. The experiments on the THUMOS'14 dataset show that all
four two-stream architectures are able to outperform the original single-stream
SST and achieve state of the art results. Additional experiments revealed that
the improvements are not restricted to a single method of calculating optical
flow by exchanging the formerly used method of Brox with FlowNet2 and still
achieving improvements. |
---|---|
DOI: | 10.48550/arxiv.1903.04176 |