Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network
Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a s...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2020-07, Vol.30 (7), p.2129-2140 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2140 |
---|---|
container_issue | 7 |
container_start_page | 2129 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 30 |
creator | Jiang, Xinghao Xu, Ke Sun, Tanfeng |
description | Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method. |
doi_str_mv | 10.1109/TCSVT.2019.2914137 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8703407</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8703407</ieee_id><sourcerecordid>2419496036</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2419496036</pqid></control><display><type>article</type><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creator><creatorcontrib>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</creatorcontrib><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2019.2914137</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Action recognition ; Coders ; Domains ; DS-LSTM ; Gesture recognition ; Hidden Markov models ; Human activity recognition ; Human motion ; Image motion analysis ; Lie group ; Lie groups ; Long short term memory ; Misalignment ; Noise reduction ; Recurrent neural networks ; Representations ; Skeleton ; ST-STD ; STAE ; Transformations (mathematics)</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</citedby><cites>FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</cites><orcidid>0000-0001-8771-9402 ; 0000-0002-3253-5136 ; 0000-0002-9758-0579</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8703407$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</description><subject>Action recognition</subject><subject>Coders</subject><subject>Domains</subject><subject>DS-LSTM</subject><subject>Gesture recognition</subject><subject>Hidden Markov models</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Image motion analysis</subject><subject>Lie group</subject><subject>Lie groups</subject><subject>Long short term memory</subject><subject>Misalignment</subject><subject>Noise reduction</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Skeleton</subject><subject>ST-STD</subject><subject>STAE</subject><subject>Transformations (mathematics)</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRSMEEqXwA7CJxDrF40dsL0t5SoVKJMDSct0JTR9JsVMh_p40rVjNHencGelE0SWQAQDRN_ko-8gHlIAeUA0cmDyKeiCESigl4rjNRECiKIjT6CyEBSHAFZe9aDJ0TVlX8Ru6-qsqu5y5Oa4xvrUBZ_FuX-IKmw7aeAxYNbbjPstmHt9lyTjLX-JXbH5qvzyPTgq7CnhxmP3o_eE-Hz0l48nj82g4ThzVokmcINKyKdPMCpgCJ07ZFFJQSB1OVeGI4oVmDAuXcgTgUpMZR-kItVAwyfrR9f7uxtffWwyNWdRbX7UvDeWguU4JS1uK7inn6xA8Fmbjy7X1vwaI2YkznTizE2cO4trS1b5UIuJ_QUnCOJHsD619aMU</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Jiang, Xinghao</creator><creator>Xu, Ke</creator><creator>Sun, Tanfeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></search><sort><creationdate>20200701</creationdate><title>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</title><author>Jiang, Xinghao ; Xu, Ke ; Sun, Tanfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-c507a3b393a51b140c8a61618e2ceb8fc084f933efc64e114790d4e7c02a1f373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Coders</topic><topic>Domains</topic><topic>DS-LSTM</topic><topic>Gesture recognition</topic><topic>Hidden Markov models</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Image motion analysis</topic><topic>Lie group</topic><topic>Lie groups</topic><topic>Long short term memory</topic><topic>Misalignment</topic><topic>Noise reduction</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Skeleton</topic><topic>ST-STD</topic><topic>STAE</topic><topic>Transformations (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Xinghao</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Sun, Tanfeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Xinghao</au><au>Xu, Ke</au><au>Sun, Tanfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2020-07-01</date><risdate>2020</risdate><volume>30</volume><issue>7</issue><spage>2129</spage><epage>2140</epage><pages>2129-2140</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2019.2914137</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8771-9402</orcidid><orcidid>https://orcid.org/0000-0002-3253-5136</orcidid><orcidid>https://orcid.org/0000-0002-9758-0579</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2020-07, Vol.30 (7), p.2129-2140 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_8703407 |
source | IEEE Electronic Library (IEL) |
subjects | Action recognition Coders Domains DS-LSTM Gesture recognition Hidden Markov models Human activity recognition Human motion Image motion analysis Lie group Lie groups Long short term memory Misalignment Noise reduction Recurrent neural networks Representations Skeleton ST-STD STAE Transformations (mathematics) |
title | Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T13%3A08%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Action%20Recognition%20Scheme%20Based%20on%20Skeleton%20Representation%20With%20DS-LSTM%20Network&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Jiang,%20Xinghao&rft.date=2020-07-01&rft.volume=30&rft.issue=7&rft.spage=2129&rft.epage=2140&rft.pages=2129-2140&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2019.2914137&rft_dat=%3Cproquest_RIE%3E2419496036%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2419496036&rft_id=info:pmid/&rft_ieee_id=8703407&rfr_iscdi=true |