Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better st...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2018-05, Vol.40 (5), p.1045-1058
Hauptverfasser:	Shahroudy, Amir, Ng, Tian-Tsong, Gong, Yihong, Wang, Gang
Format:	Artikel
Sprache:	eng
Schlagworte:	action recognition Classification Correlation Feature extraction Feature recognition Multimodal analysis Norms Regularization RGB+D Robustness Sensors Skeleton State of the art structured sparsity Three-dimensional displays Videos
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1058
container_issue	5
container_start_page	1045
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	40
creator	Shahroudy, Amir Ng, Tian-Tsong Gong, Yihong Wang, Gang
description	Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.
doi_str_mv	10.1109/TPAMI.2017.2691321
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1886346410</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7892950</ieee_id><sourcerecordid>1886346410</sourcerecordid><originalsourceid>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</originalsourceid><addsrcrecordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174509158</pqid></control><display><type>article</type><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><source>IEEE Electronic Library (IEL)</source><creator>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creator><creatorcontrib>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creatorcontrib><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2691321</identifier><identifier>PMID: 28391189</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>action recognition ; Classification ; Correlation ; Feature extraction ; Feature recognition ; Multimodal analysis ; Norms ; Regularization ; RGB+D ; Robustness ; Sensors ; Skeleton ; State of the art ; structured sparsity ; Three-dimensional displays ; Videos</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</citedby><cites>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</cites><orcidid>0000-0002-1045-6437</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28391189$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><subject>action recognition</subject><subject>Classification</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Multimodal analysis</subject><subject>Norms</subject><subject>Regularization</subject><subject>RGB+D</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Skeleton</subject><subject>State of the art</subject><subject>structured sparsity</subject><subject>Three-dimensional displays</subject><subject>Videos</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</recordid><startdate>20180501</startdate><enddate>20180501</enddate><creator>Shahroudy, Amir</creator><creator>Ng, Tian-Tsong</creator><creator>Gong, Yihong</creator><creator>Wang, Gang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></search><sort><creationdate>20180501</creationdate><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><author>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>action recognition</topic><topic>Classification</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Multimodal analysis</topic><topic>Norms</topic><topic>Regularization</topic><topic>RGB+D</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Skeleton</topic><topic>State of the art</topic><topic>structured sparsity</topic><topic>Three-dimensional displays</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shahroudy, Amir</au><au>Ng, Tian-Tsong</au><au>Gong, Yihong</au><au>Wang, Gang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-05-01</date><risdate>2018</risdate><volume>40</volume><issue>5</issue><spage>1045</spage><epage>1058</epage><pages>1045-1058</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28391189</pmid><doi>10.1109/TPAMI.2017.2691321</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_proquest_miscellaneous_1886346410
source	IEEE Electronic Library (IEL)
subjects	action recognition Classification Correlation Feature extraction Feature recognition Multimodal analysis Norms Regularization RGB+D Robustness Sensors Skeleton State of the art structured sparsity Three-dimensional displays Videos
title	Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A52%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Multimodal%20Feature%20Analysis%20for%20Action%20Recognition%20in%20RGB+D%20Videos&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Shahroudy,%20Amir&rft.date=2018-05-01&rft.volume=40&rft.issue=5&rft.spage=1045&rft.epage=1058&rft.pages=1045-1058&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2691321&rft_dat=%3Cproquest_RIE%3E1886346410%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174509158&rft_id=info:pmid/28391189&rft_ieee_id=7892950&rfr_iscdi=true