Sparse Deep LSTMs with Convolutional Attention for Human Action Recognition
Deep learning has recently gained remarkable results in action recognition. In this paper, an architecture is proposed for action recognition, including ResNet feature extractor, Conv-Attention-LSTM, BiLSTM, and fully connected layers. Furthermore, a sparse layer after each LSTM layer is added to ov...
Gespeichert in:
Veröffentlicht in: | SN computer science 2021-05, Vol.2 (3), p.151, Article 151 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning has recently gained remarkable results in action recognition. In this paper, an architecture is proposed for action recognition, including ResNet feature extractor, Conv-Attention-LSTM, BiLSTM, and fully connected layers. Furthermore, a sparse layer after each LSTM layer is added to overcome overfitting. In addition to RGB images, optical flow is also used to incorporate motion information into our architecture
.
Due to similarities of consecutive frames, video sequences are divided into equal parts. Frames of successive parts are used as consecutive frames to obtain the flow. Furthermore, to find the significant regions, the convolutional attention network is applied. The proposed method is evaluated using two popular datasets, UCF-101, and HMDB-51 and its accuracy for these datasets is 95.24 and 71.62, respectively. Overfitting is reduced due to using a sparse layer instead of a dropout based on the results achieved. Moreover, a deep LSTM network leads to a higher recognition rate than one-layer LSTM. |
---|---|
ISSN: | 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-021-00576-x |