Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network

Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2020-07, Vol.30 (7), p.2129-2140
Hauptverfasser: Jiang, Xinghao, Xu, Ke, Sun, Tanfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2140
container_issue 7
container_start_page 2129
container_title IEEE transactions on circuits and systems for video technology
container_volume 30
creator Jiang, Xinghao
Xu, Ke
Sun, Tanfeng
description Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.
doi_str_mv 10.1109/TCSVT.2019.2914137
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8703407</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8703407</ieee_id><sourcerecordid>2419496036</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2419496036</pqid></control><display><type>article</type><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creator><creatorcontrib>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creatorcontrib><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2019.2914137</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Action recognition ; Coders ; Domains ; DS-LSTM ; Gesture recognition ; Hidden Markov models ; Human activity recognition ; Human motion ; Image motion analysis ; Lie group ; Lie groups ; Long short term memory ; Misalignment ; Noise reduction ; Recurrent neural networks ; Representations ; Skeleton ; ST-STD ; STAE ; Transformations (mathematics)</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</citedby><cites>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</cites><orcidid>0000-0001-8771-9402 ; 0000-0002-3253-5136 ; 0000-0002-9758-0579</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><subject>Action recognition</subject><subject>Coders</subject><subject>Domains</subject><subject>DS-LSTM</subject><subject>Gesture recognition</subject><subject>Hidden Markov models</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Image motion analysis</subject><subject>Lie group</subject><subject>Lie groups</subject><subject>Long short term memory</subject><subject>Misalignment</subject><subject>Noise reduction</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Skeleton</subject><subject>ST-STD</subject><subject>STAE</subject><subject>Transformations (mathematics)</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Jiang, Xinghao</creator><creator>Xu, Ke</creator><creator>Sun, Tanfeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></search><sort><creationdate>20200701</creationdate><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><author>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Coders</topic><topic>Domains</topic><topic>DS-LSTM</topic><topic>Gesture recognition</topic><topic>Hidden Markov models</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Image motion analysis</topic><topic>Lie group</topic><topic>Lie groups</topic><topic>Long short term memory</topic><topic>Misalignment</topic><topic>Noise reduction</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Skeleton</topic><topic>ST-STD</topic><topic>STAE</topic><topic>Transformations (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Xinghao</au><au>Xu, Ke</au><au>Sun, Tanfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2020-07-01</date><risdate>2020</risdate><volume>30</volume><issue>7</issue><spage>2129</spage><epage>2140</epage><pages>2129-2140</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2019.2914137</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_8703407
source IEEE Electronic Library (IEL)
subjects Action recognition
Coders
Domains
DS-LSTM
Gesture recognition
Hidden Markov models
Human activity recognition
Human motion
Image motion analysis
Lie group
Lie groups
Long short term memory
Misalignment
Noise reduction
Recurrent neural networks
Representations
Skeleton
ST-STD
STAE
Transformations (mathematics)
title Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T13%3A08%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Action%20Recognition%20Scheme%20Based%20on%20Skeleton%20Representation%20With%20DS-LSTM%20Network&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Jiang,%20Xinghao&rft.date=2020-07-01&rft.volume=30&rft.issue=7&rft.spage=2129&rft.epage=2140&rft.pages=2129-2140&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2019.2914137&rft_dat=%3Cproquest_RIE%3E2419496036%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2419496036&rft_id=info:pmid/&rft_ieee_id=8703407&rfr_iscdi=true