A review of video action recognition based on 3D convolution

Video action recognition is one of the topics for video understanding. Over the past decade, video action recognition has made great progress due to the emergence of deep learning, especially, the application of 3D convolution, which further improves the accuracy of recognition. However, three chall...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & electrical engineering 2023-05, Vol.108, p.108713, Article 108713
Hauptverfasser: Huang, Xiankai, Cai, Zhibin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Video action recognition is one of the topics for video understanding. Over the past decade, video action recognition has made great progress due to the emergence of deep learning, especially, the application of 3D convolution, which further improves the accuracy of recognition. However, three challenges remain: difficulty in capturing long video features, high computational costs, and difficulty in comparing methods due to different benchmarks. Therefore, in view of the above three challenges, this paper summarizes and analyzes existing video action recognition methods based on 3D convolution to help new researchers understand this field. Our contributions include 3 parts. Firstly, we introduce the classical video action recognition methods based on 3D convolution and point out two problems of the methods. Then, we summarize the existing improved methods based on 3D convolution and the popular datasets and compare and analyze the experimental results of these methods on the benchmark. Finally, we discuss current challenges for video action recognition and analyze future development trends.
ISSN:0045-7906
1879-0755
DOI:10.1016/j.compeleceng.2023.108713