Using efficient group pseudo-3D network to learn spatio-temporal features
Action classification is a challenging problem in computer vision in recent years; the three-dimensional convolutional neural network plays an important role in spatio-temporal feature extraction. However, the 3D convolution approach needs expensive computation and memory resources. This paper propo...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2021-03, Vol.15 (2), p.361-369 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Action classification is a challenging problem in computer vision in recent years; the three-dimensional convolutional neural network plays an important role in spatio-temporal feature extraction. However, the 3D convolution approach needs expensive computation and memory resources. This paper proposes an efficient group pseudo-3D (GP3D) convolution to reduce the model size and need less computational power. We built the GP3D with MobileNetV3 to extend the 2D pre-training parameters directly to the 3D convolutional network. We also used GP3D to replace the original inflated 3D convolutional network to efficiently reduce the model size. Compared with other state-of-the-art 3D convolutional networks, GP3D with the efficient network of MobileNetV3 can save about 3 to 22 times of parameters but maintain the same accuracy on the dataset of UCF-101. GP3D with an inflated 3D convolutional network can achieve about 90% top1 accuracy, while the model size is only about half of the original inflated 3D convolutional network. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-020-01758-5 |