Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the cu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational intelligence 2019-08, Vol.35 (3), p.535-554
Hauptverfasser:	Wu, Yirui, Wei, Lianglei, Duan, Yucong
Format:	Artikel
Sprache:	eng
Schlagworte:	3D action recognition Artificial intelligence Body parts Feature extraction Feature recognition Human activity recognition Human motion long short‐term memory spatiotemporal analysis video analysis Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	554
container_issue	3
container_start_page	535
container_title	Computational intelligence
container_volume	35
creator	Wu, Yirui Wei, Lianglei Duan, Yucong
description	With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the currently popular long short‐term memory (LSTM)–based learning scheme to obtain accurate and robust action recognition. Considering that actions are essentially formed by small subactions, we first utilize a two‐dimensional wavelet transform to extract temporal pattern descriptors in the frequency domain for each subaction. Afterward, we design a novel LSTM structure to extract deep features, which model a long‐term spatiotemporal correlation between body parts. Since temporal pattern descriptors and LSTM deep features can be regarded as multimodal representations for actions, we fuse them with an autoencoder network to achieve a more effective feature descriptor for action recognition. Experimental results on three challenging data sets with several comparative methods demonstrate the effectiveness of the proposed method for three‐dimensional action recognition.
doi_str_mv	10.1111/coin.12207
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2272632003</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2272632003</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3017-c4986ff4f5e837ccdaab5d268d053bd8980bb5a0ba826572065a43c0050b75a63</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEEmNw4RdE4obU4STNx45oYzBpsAPjSpSmKevYmpK0mvbv6SjiiC-25MevpQehawIj0tWd9WU1IpSCPEEDkgqZKJHCKRqAomkix4yfo4sYNwBAWKoG6H3qXI1jbZrSN25X-2C2ePG6esaVa_Y-fOJ92azx36oDGxcqXDjTtMHhwgfMpnjd7kyFje1SKhyc9R9VeZwv0VlhttFd_fYheps9rCZPyWL5OJ_cLxLLgMjEpmMliiItuFNMWpsbk_GcCpUDZ1muxgqyjBvIjKKCSwqCm5RZAA6Z5EawIbrpc-vgv1oXG73xbai6l5pSSQWjAKyjbnvKBh9jcIWuQ7kz4aAJ6KM_ffSnf_x1MOnhfbl1h39IPVnOX_qbb5hlcvo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2272632003</pqid></control><display><type>article</type><title>Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition</title><source>Wiley Online Library Journals Frontfile Complete</source><source>Business Source Complete</source><creator>Wu, Yirui ; Wei, Lianglei ; Duan, Yucong</creator><creatorcontrib>Wu, Yirui ; Wei, Lianglei ; Duan, Yucong</creatorcontrib><description>With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the currently popular long short‐term memory (LSTM)–based learning scheme to obtain accurate and robust action recognition. Considering that actions are essentially formed by small subactions, we first utilize a two‐dimensional wavelet transform to extract temporal pattern descriptors in the frequency domain for each subaction. Afterward, we design a novel LSTM structure to extract deep features, which model a long‐term spatiotemporal correlation between body parts. Since temporal pattern descriptors and LSTM deep features can be regarded as multimodal representations for actions, we fuse them with an autoencoder network to achieve a more effective feature descriptor for action recognition. Experimental results on three challenging data sets with several comparative methods demonstrate the effectiveness of the proposed method for three‐dimensional action recognition.</description><identifier>ISSN: 0824-7935</identifier><identifier>EISSN: 1467-8640</identifier><identifier>DOI: 10.1111/coin.12207</identifier><language>eng</language><publisher>Hoboken: Blackwell Publishing Ltd</publisher><subject>3D action recognition ; Artificial intelligence ; Body parts ; Feature extraction ; Feature recognition ; Human activity recognition ; Human motion ; long short‐term memory ; spatiotemporal analysis ; video analysis ; Wavelet transforms</subject><ispartof>Computational intelligence, 2019-08, Vol.35 (3), p.535-554</ispartof><rights>2019 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3017-c4986ff4f5e837ccdaab5d268d053bd8980bb5a0ba826572065a43c0050b75a63</citedby><cites>FETCH-LOGICAL-c3017-c4986ff4f5e837ccdaab5d268d053bd8980bb5a0ba826572065a43c0050b75a63</cites><orcidid>0000-0003-3022-3718</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fcoin.12207$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fcoin.12207$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Wu, Yirui</creatorcontrib><creatorcontrib>Wei, Lianglei</creatorcontrib><creatorcontrib>Duan, Yucong</creatorcontrib><title>Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition</title><title>Computational intelligence</title><description>With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the currently popular long short‐term memory (LSTM)–based learning scheme to obtain accurate and robust action recognition. Considering that actions are essentially formed by small subactions, we first utilize a two‐dimensional wavelet transform to extract temporal pattern descriptors in the frequency domain for each subaction. Afterward, we design a novel LSTM structure to extract deep features, which model a long‐term spatiotemporal correlation between body parts. Since temporal pattern descriptors and LSTM deep features can be regarded as multimodal representations for actions, we fuse them with an autoencoder network to achieve a more effective feature descriptor for action recognition. Experimental results on three challenging data sets with several comparative methods demonstrate the effectiveness of the proposed method for three‐dimensional action recognition.</description><subject>3D action recognition</subject><subject>Artificial intelligence</subject><subject>Body parts</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>long short‐term memory</subject><subject>spatiotemporal analysis</subject><subject>video analysis</subject><subject>Wavelet transforms</subject><issn>0824-7935</issn><issn>1467-8640</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1PwzAMhiMEEmNw4RdE4obU4STNx45oYzBpsAPjSpSmKevYmpK0mvbv6SjiiC-25MevpQehawIj0tWd9WU1IpSCPEEDkgqZKJHCKRqAomkix4yfo4sYNwBAWKoG6H3qXI1jbZrSN25X-2C2ePG6esaVa_Y-fOJ92azx36oDGxcqXDjTtMHhwgfMpnjd7kyFje1SKhyc9R9VeZwv0VlhttFd_fYheps9rCZPyWL5OJ_cLxLLgMjEpmMliiItuFNMWpsbk_GcCpUDZ1muxgqyjBvIjKKCSwqCm5RZAA6Z5EawIbrpc-vgv1oXG73xbai6l5pSSQWjAKyjbnvKBh9jcIWuQ7kz4aAJ6KM_ffSnf_x1MOnhfbl1h39IPVnOX_qbb5hlcvo</recordid><startdate>201908</startdate><enddate>201908</enddate><creator>Wu, Yirui</creator><creator>Wei, Lianglei</creator><creator>Duan, Yucong</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3022-3718</orcidid></search><sort><creationdate>201908</creationdate><title>Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition</title><author>Wu, Yirui ; Wei, Lianglei ; Duan, Yucong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3017-c4986ff4f5e837ccdaab5d268d053bd8980bb5a0ba826572065a43c0050b75a63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>3D action recognition</topic><topic>Artificial intelligence</topic><topic>Body parts</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>long short‐term memory</topic><topic>spatiotemporal analysis</topic><topic>video analysis</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Yirui</creatorcontrib><creatorcontrib>Wei, Lianglei</creatorcontrib><creatorcontrib>Duan, Yucong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computational intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Yirui</au><au>Wei, Lianglei</au><au>Duan, Yucong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition</atitle><jtitle>Computational intelligence</jtitle><date>2019-08</date><risdate>2019</risdate><volume>35</volume><issue>3</issue><spage>535</spage><epage>554</epage><pages>535-554</pages><issn>0824-7935</issn><eissn>1467-8640</eissn><abstract>With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the currently popular long short‐term memory (LSTM)–based learning scheme to obtain accurate and robust action recognition. Considering that actions are essentially formed by small subactions, we first utilize a two‐dimensional wavelet transform to extract temporal pattern descriptors in the frequency domain for each subaction. Afterward, we design a novel LSTM structure to extract deep features, which model a long‐term spatiotemporal correlation between body parts. Since temporal pattern descriptors and LSTM deep features can be regarded as multimodal representations for actions, we fuse them with an autoencoder network to achieve a more effective feature descriptor for action recognition. Experimental results on three challenging data sets with several comparative methods demonstrate the effectiveness of the proposed method for three‐dimensional action recognition.</abstract><cop>Hoboken</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/coin.12207</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0003-3022-3718</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0824-7935
ispartof	Computational intelligence, 2019-08, Vol.35 (3), p.535-554
issn	0824-7935 1467-8640
language	eng
recordid	cdi_proquest_journals_2272632003
source	Wiley Online Library Journals Frontfile Complete; Business Source Complete
subjects	3D action recognition Artificial intelligence Body parts Feature extraction Feature recognition Human activity recognition Human motion long short‐term memory spatiotemporal analysis video analysis Wavelet transforms
title	Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T08%3A18%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20spatiotemporal%20LSTM%20network%20with%20temporal%20pattern%20feature%20for%203D%20human%20action%20recognition&rft.jtitle=Computational%20intelligence&rft.au=Wu,%20Yirui&rft.date=2019-08&rft.volume=35&rft.issue=3&rft.spage=535&rft.epage=554&rft.pages=535-554&rft.issn=0824-7935&rft.eissn=1467-8640&rft_id=info:doi/10.1111/coin.12207&rft_dat=%3Cproquest_cross%3E2272632003%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2272632003&rft_id=info:pmid/&rfr_iscdi=true