Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network

Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2020-07, Vol.30 (7), p.2129-2140
Hauptverfasser:	Jiang, Xinghao, Xu, Ke, Sun, Tanfeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Coders Domains DS-LSTM Gesture recognition Hidden Markov models Human activity recognition Human motion Image motion analysis Lie group Lie groups Long short term memory Misalignment Noise reduction Recurrent neural networks Representations Skeleton ST-STD STAE Transformations (mathematics)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2140
container_issue	7
container_start_page	2129
container_title	IEEE transactions on circuits and systems for video technology
container_volume	30
creator	Jiang, Xinghao Xu, Ke Sun, Tanfeng
description	Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.
doi_str_mv	10.1109/TCSVT.2019.2914137
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8703407</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8703407</ieee_id><sourcerecordid>2419496036</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2419496036</pqid></control><display><type>article</type><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creator><creatorcontrib>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creatorcontrib><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2019.2914137</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Action recognition ; Coders ; Domains ; DS-LSTM ; Gesture recognition ; Hidden Markov models ; Human activity recognition ; Human motion ; Image motion analysis ; Lie group ; Lie groups ; Long short term memory ; Misalignment ; Noise reduction ; Recurrent neural networks ; Representations ; Skeleton ; ST-STD ; STAE ; Transformations (mathematics)</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</citedby><cites>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</cites><orcidid>0000-0001-8771-9402 ; 0000-0002-3253-5136 ; 0000-0002-9758-0579</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><subject>Action recognition</subject><subject>Coders</subject><subject>Domains</subject><subject>DS-LSTM</subject><subject>Gesture recognition</subject><subject>Hidden Markov models</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Image motion analysis</subject><subject>Lie group</subject><subject>Lie groups</subject><subject>Long short term memory</subject><subject>Misalignment</subject><subject>Noise reduction</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Skeleton</subject><subject>ST-STD</subject><subject>STAE</subject><subject>Transformations (mathematics)</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Jiang, Xinghao</creator><creator>Xu, Ke</creator><creator>Sun, Tanfeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></search><sort><creationdate>20200701</creationdate><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><author>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Coders</topic><topic>Domains</topic><topic>DS-LSTM</topic><topic>Gesture recognition</topic><topic>Hidden Markov models</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Image motion analysis</topic><topic>Lie group</topic><topic>Lie groups</topic><topic>Long short term memory</topic><topic>Misalignment</topic><topic>Noise reduction</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Skeleton</topic><topic>ST-STD</topic><topic>STAE</topic><topic>Transformations (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Xinghao</au><au>Xu, Ke</au><au>Sun, Tanfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2020-07-01</date><risdate>2020</risdate><volume>30</volume><issue>7</issue><spage>2129</spage><epage>2140</epage><pages>2129-2140</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2019.2914137</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140
issn	1051-8215 1558-2205
language	eng
recordid	cdi_ieee_primary_8703407
source	IEEE Electronic Library (IEL)
subjects	Action recognition Coders Domains DS-LSTM Gesture recognition Hidden Markov models Human activity recognition Human motion Image motion analysis Lie group Lie groups Long short term memory Misalignment Noise reduction Recurrent neural networks Representations Skeleton ST-STD STAE Transformations (mathematics)
title	Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T13%3A08%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Action%20Recognition%20Scheme%20Based%20on%20Skeleton%20Representation%20With%20DS-LSTM%20Network&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Jiang,%20Xinghao&rft.date=2020-07-01&rft.volume=30&rft.issue=7&rft.spage=2129&rft.epage=2140&rft.pages=2129-2140&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2019.2914137&rft_dat=%3Cproquest_RIE%3E2419496036%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2419496036&rft_id=info:pmid/&rft_ieee_id=8703407&rfr_iscdi=true