Data-level information enhancement: Motion-patch-based Siamese Convolutional Neural Networks for human activity recognition in videos

•Minimize the problem of bad samples generation caused due to random cropping.•A new attempt to improve results by data-level motion information enhancement.•A simple but effective method to extract saliency motion regions in video clips.•An end-to-end learning frame work without training hand-craft...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2020-06, Vol.147, p.113203, Article 113203
Hauptverfasser: Zhang, Yujia, Man Po, Lai, Liu, Mengyang, Ur Rehman, Yasar Abbas, Ou, Weifeng, Zhao, Yuzhi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Minimize the problem of bad samples generation caused due to random cropping.•A new attempt to improve results by data-level motion information enhancement.•A simple but effective method to extract saliency motion regions in video clips.•An end-to-end learning frame work without training hand-crafted features.•The proposed method improved the performance upon state-of-the-art approaches. Data augmentation is critical for deep learning-based human activity recognition (HAR) systems. However, conventional data augmentation methods, such as random-cropping, may generate bad samples that are unrelated to a particular activity (e.g. the background patches without saliency motion information). As a result, the random-cropping based data augmentation may affect negatively the overall performance of HAR systems. Humans, in turn, tend to pay more attention to motion information when recognizing activities. In this work, we attempt to enhance the motion information in HAR systems and mitigate the influence of bad samples through a Siamese architecture, termed as Motion-patch-based Siamese Convolutional Neural Network (MSCNN). The term motion patch is defined as a specific square region that includes critical motion information in the video. We propose a simple yet effective method for selecting those regions. To evaluate the proposed MSCNN, we conduct a number of experiments on the popular datasets UCF-101 and HMDB-51. The mathematical model and experimental results show that the proposed architecture is capable of enhancing the motion information and achieves comparable performance.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2020.113203