Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN

With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skele...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neurocomputing (Amsterdam) 2020-11, Vol.414, p.90-100
Hauptverfasser:	Zhu, Aichun, Wu, Qianyu, Cui, Ran, Wang, Tian, Hang, Wenlong, Hua, Gang, Snoussi, Hichem
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Dependent relational model Engineering Sciences Signal and Image processing Spatial–temporal information
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	100
container_issue
container_start_page	90
container_title	Neurocomputing (Amsterdam)
container_volume	414
creator	Zhu, Aichun Wu, Qianyu Cui, Ran Wang, Tian Hang, Wenlong Hua, Gang Snoussi, Hichem
description	With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skeleton-based action recognition. However, these approaches are limited in the ability to explore the rich spatial–temporal relational information. In this paper, we propose a new spatial–temporal model with an end-to-end bidirectional LSTM-CNN (BiLSTM-CNN). First, a hierarchical spatial–temporal dependent relational model is used to explore rich spatial–temporal information in the skeleton data. Then a new framework is proposed to fuse CNN and LSTM. In this framework, the skeleton data are built by the dependent relational model and serve as the input of the proposed network. Then LSTM is used to extract the temporal features, and followed by a standard CNN to explore the spatial information from the output of LSTM. Finally, the experimental results demonstrate the effectiveness of the proposed model on the NTU RGB+D, SBU Interaction and UTD-MHAD dataset.
doi_str_mv	10.1016/j.neucom.2020.07.068
format	Article
fullrecord	<record><control><sourceid>hal_cross</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03320682v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231220311760</els_id><sourcerecordid>oai_HAL_hal_03320682v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c340t-2fa57fa794ef209486726fdf279e754dce97ccbdc40f116343b9c1e7025ec0673</originalsourceid><addsrcrecordid>eNp9kL1O7DAQhS0EEsvPG1C4pUju2MnGSYOEVvxJCxRAbTn2GLxk48gOCBp03-G-4X0SvARRUs3ozPlGOoeQIwY5A1b9WeU9vmi_zjlwyEHkUNVbZMZqwbOa19U2mUHD5xkvGN8lezGuAJhgvJmRj7O3ofPB9Y9U0eD0E42DGp3q_v_9N-J68EF11OCAvcF-pAG7dPV9EtfeYEetDzQ-Y4ej77NWRTRU6Y0jWbV_7N3X3r7T1hmXpG94eXd_nS1ubg7IjlVdxMPvuU8ezs_uF5fZ8vbianG6zHRRwphxq-bCKtGUaDk0ZV0JXlljuWhQzEujsRFat0aXYBmrirJoG81QAJ-jhkoU--R4-vukOjkEt1bhXXrl5OXpUm40KAqeWuOvLHnLyauDjzGg_QEYyE3fciWnvuWmbwlCJjJhJxOGKcerwyCjdthrnHJL493vDz4BXNCOCA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Zhu, Aichun ; Wu, Qianyu ; Cui, Ran ; Wang, Tian ; Hang, Wenlong ; Hua, Gang ; Snoussi, Hichem</creator><creatorcontrib>Zhu, Aichun ; Wu, Qianyu ; Cui, Ran ; Wang, Tian ; Hang, Wenlong ; Hua, Gang ; Snoussi, Hichem</creatorcontrib><description>With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skeleton-based action recognition. However, these approaches are limited in the ability to explore the rich spatial–temporal relational information. In this paper, we propose a new spatial–temporal model with an end-to-end bidirectional LSTM-CNN (BiLSTM-CNN). First, a hierarchical spatial–temporal dependent relational model is used to explore rich spatial–temporal information in the skeleton data. Then a new framework is proposed to fuse CNN and LSTM. In this framework, the skeleton data are built by the dependent relational model and serve as the input of the proposed network. Then LSTM is used to extract the temporal features, and followed by a standard CNN to explore the spatial information from the output of LSTM. Finally, the experimental results demonstrate the effectiveness of the proposed model on the NTU RGB+D, SBU Interaction and UTD-MHAD dataset.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2020.07.068</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Action recognition ; Dependent relational model ; Engineering Sciences ; Signal and Image processing ; Spatial–temporal information</subject><ispartof>Neurocomputing (Amsterdam), 2020-11, Vol.414, p.90-100</ispartof><rights>2020 Elsevier B.V.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c340t-2fa57fa794ef209486726fdf279e754dce97ccbdc40f116343b9c1e7025ec0673</citedby><cites>FETCH-LOGICAL-c340t-2fa57fa794ef209486726fdf279e754dce97ccbdc40f116343b9c1e7025ec0673</cites><orcidid>0000-0002-6563-2135</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neucom.2020.07.068$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,780,784,885,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://utt.hal.science/hal-03320682$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Aichun</creatorcontrib><creatorcontrib>Wu, Qianyu</creatorcontrib><creatorcontrib>Cui, Ran</creatorcontrib><creatorcontrib>Wang, Tian</creatorcontrib><creatorcontrib>Hang, Wenlong</creatorcontrib><creatorcontrib>Hua, Gang</creatorcontrib><creatorcontrib>Snoussi, Hichem</creatorcontrib><title>Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN</title><title>Neurocomputing (Amsterdam)</title><description>With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skeleton-based action recognition. However, these approaches are limited in the ability to explore the rich spatial–temporal relational information. In this paper, we propose a new spatial–temporal model with an end-to-end bidirectional LSTM-CNN (BiLSTM-CNN). First, a hierarchical spatial–temporal dependent relational model is used to explore rich spatial–temporal information in the skeleton data. Then a new framework is proposed to fuse CNN and LSTM. In this framework, the skeleton data are built by the dependent relational model and serve as the input of the proposed network. Then LSTM is used to extract the temporal features, and followed by a standard CNN to explore the spatial information from the output of LSTM. Finally, the experimental results demonstrate the effectiveness of the proposed model on the NTU RGB+D, SBU Interaction and UTD-MHAD dataset.</description><subject>Action recognition</subject><subject>Dependent relational model</subject><subject>Engineering Sciences</subject><subject>Signal and Image processing</subject><subject>Spatial–temporal information</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kL1O7DAQhS0EEsvPG1C4pUju2MnGSYOEVvxJCxRAbTn2GLxk48gOCBp03-G-4X0SvARRUs3ozPlGOoeQIwY5A1b9WeU9vmi_zjlwyEHkUNVbZMZqwbOa19U2mUHD5xkvGN8lezGuAJhgvJmRj7O3ofPB9Y9U0eD0E42DGp3q_v_9N-J68EF11OCAvcF-pAG7dPV9EtfeYEetDzQ-Y4ej77NWRTRU6Y0jWbV_7N3X3r7T1hmXpG94eXd_nS1ubg7IjlVdxMPvuU8ezs_uF5fZ8vbianG6zHRRwphxq-bCKtGUaDk0ZV0JXlljuWhQzEujsRFat0aXYBmrirJoG81QAJ-jhkoU--R4-vukOjkEt1bhXXrl5OXpUm40KAqeWuOvLHnLyauDjzGg_QEYyE3fciWnvuWmbwlCJjJhJxOGKcerwyCjdthrnHJL493vDz4BXNCOCA</recordid><startdate>20201113</startdate><enddate>20201113</enddate><creator>Zhu, Aichun</creator><creator>Wu, Qianyu</creator><creator>Cui, Ran</creator><creator>Wang, Tian</creator><creator>Hang, Wenlong</creator><creator>Hua, Gang</creator><creator>Snoussi, Hichem</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-6563-2135</orcidid></search><sort><creationdate>20201113</creationdate><title>Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN</title><author>Zhu, Aichun ; Wu, Qianyu ; Cui, Ran ; Wang, Tian ; Hang, Wenlong ; Hua, Gang ; Snoussi, Hichem</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c340t-2fa57fa794ef209486726fdf279e754dce97ccbdc40f116343b9c1e7025ec0673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Dependent relational model</topic><topic>Engineering Sciences</topic><topic>Signal and Image processing</topic><topic>Spatial–temporal information</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Aichun</creatorcontrib><creatorcontrib>Wu, Qianyu</creatorcontrib><creatorcontrib>Cui, Ran</creatorcontrib><creatorcontrib>Wang, Tian</creatorcontrib><creatorcontrib>Hang, Wenlong</creatorcontrib><creatorcontrib>Hua, Gang</creatorcontrib><creatorcontrib>Snoussi, Hichem</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Aichun</au><au>Wu, Qianyu</au><au>Cui, Ran</au><au>Wang, Tian</au><au>Hang, Wenlong</au><au>Hua, Gang</au><au>Snoussi, Hichem</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2020-11-13</date><risdate>2020</risdate><volume>414</volume><spage>90</spage><epage>100</epage><pages>90-100</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skeleton-based action recognition. However, these approaches are limited in the ability to explore the rich spatial–temporal relational information. In this paper, we propose a new spatial–temporal model with an end-to-end bidirectional LSTM-CNN (BiLSTM-CNN). First, a hierarchical spatial–temporal dependent relational model is used to explore rich spatial–temporal information in the skeleton data. Then a new framework is proposed to fuse CNN and LSTM. In this framework, the skeleton data are built by the dependent relational model and serve as the input of the proposed network. Then LSTM is used to extract the temporal features, and followed by a standard CNN to explore the spatial information from the output of LSTM. Finally, the experimental results demonstrate the effectiveness of the proposed model on the NTU RGB+D, SBU Interaction and UTD-MHAD dataset.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2020.07.068</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-6563-2135</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0925-2312
ispartof	Neurocomputing (Amsterdam), 2020-11, Vol.414, p.90-100
issn	0925-2312 1872-8286
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_03320682v1
source	Elsevier ScienceDirect Journals Complete
subjects	Action recognition Dependent relational model Engineering Sciences Signal and Image processing Spatial–temporal information
title	Exploring a rich spatial–temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A55%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20a%20rich%20spatial%E2%80%93temporal%20dependent%20relational%20model%20for%20skeleton-based%20action%20recognition%20by%20bidirectional%20LSTM-CNN&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=Zhu,%20Aichun&rft.date=2020-11-13&rft.volume=414&rft.spage=90&rft.epage=100&rft.pages=90-100&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2020.07.068&rft_dat=%3Chal_cross%3Eoai_HAL_hal_03320682v1%3C/hal_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_els_id=S0925231220311760&rfr_iscdi=true