Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2018-05, Vol.40 (5), p.1045-1058
Hauptverfasser: Shahroudy, Amir, Ng, Tian-Tsong, Gong, Yihong, Wang, Gang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1058
container_issue 5
container_start_page 1045
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 40
creator Shahroudy, Amir
Ng, Tian-Tsong
Gong, Yihong
Wang, Gang
description Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.
doi_str_mv 10.1109/TPAMI.2017.2691321
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1886346410</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7892950</ieee_id><sourcerecordid>1886346410</sourcerecordid><originalsourceid>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</originalsourceid><addsrcrecordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174509158</pqid></control><display><type>article</type><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><source>IEEE Electronic Library (IEL)</source><creator>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creator><creatorcontrib>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creatorcontrib><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2691321</identifier><identifier>PMID: 28391189</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>action recognition ; Classification ; Correlation ; Feature extraction ; Feature recognition ; Multimodal analysis ; Norms ; Regularization ; RGB+D ; Robustness ; Sensors ; Skeleton ; State of the art ; structured sparsity ; Three-dimensional displays ; Videos</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</citedby><cites>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</cites><orcidid>0000-0002-1045-6437</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28391189$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><subject>action recognition</subject><subject>Classification</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Multimodal analysis</subject><subject>Norms</subject><subject>Regularization</subject><subject>RGB+D</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Skeleton</subject><subject>State of the art</subject><subject>structured sparsity</subject><subject>Three-dimensional displays</subject><subject>Videos</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</recordid><startdate>20180501</startdate><enddate>20180501</enddate><creator>Shahroudy, Amir</creator><creator>Ng, Tian-Tsong</creator><creator>Gong, Yihong</creator><creator>Wang, Gang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></search><sort><creationdate>20180501</creationdate><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><author>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>action recognition</topic><topic>Classification</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Multimodal analysis</topic><topic>Norms</topic><topic>Regularization</topic><topic>RGB+D</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Skeleton</topic><topic>State of the art</topic><topic>structured sparsity</topic><topic>Three-dimensional displays</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shahroudy, Amir</au><au>Ng, Tian-Tsong</au><au>Gong, Yihong</au><au>Wang, Gang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-05-01</date><risdate>2018</risdate><volume>40</volume><issue>5</issue><spage>1045</spage><epage>1058</epage><pages>1045-1058</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28391189</pmid><doi>10.1109/TPAMI.2017.2691321</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_proquest_miscellaneous_1886346410
source IEEE Electronic Library (IEL)
subjects action recognition
Classification
Correlation
Feature extraction
Feature recognition
Multimodal analysis
Norms
Regularization
RGB+D
Robustness
Sensors
Skeleton
State of the art
structured sparsity
Three-dimensional displays
Videos
title Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A52%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Multimodal%20Feature%20Analysis%20for%20Action%20Recognition%20in%20RGB+D%20Videos&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Shahroudy,%20Amir&rft.date=2018-05-01&rft.volume=40&rft.issue=5&rft.spage=1045&rft.epage=1058&rft.pages=1045-1058&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2691321&rft_dat=%3Cproquest_RIE%3E1886346410%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174509158&rft_id=info:pmid/28391189&rft_ieee_id=7892950&rfr_iscdi=true