Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos
Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better st...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2018-05, Vol.40 (5), p.1045-1058 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1058 |
---|---|
container_issue | 5 |
container_start_page | 1045 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 40 |
creator | Shahroudy, Amir Ng, Tian-Tsong Gong, Yihong Wang, Gang |
description | Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets. |
doi_str_mv | 10.1109/TPAMI.2017.2691321 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1886346410</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7892950</ieee_id><sourcerecordid>1886346410</sourcerecordid><originalsourceid>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</originalsourceid><addsrcrecordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174509158</pqid></control><display><type>article</type><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><source>IEEE Electronic Library (IEL)</source><creator>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creator><creatorcontrib>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</creatorcontrib><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2691321</identifier><identifier>PMID: 28391189</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>action recognition ; Classification ; Correlation ; Feature extraction ; Feature recognition ; Multimodal analysis ; Norms ; Regularization ; RGB+D ; Robustness ; Sensors ; Skeleton ; State of the art ; structured sparsity ; Three-dimensional displays ; Videos</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</citedby><cites>FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</cites><orcidid>0000-0002-1045-6437</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7892950$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28391189$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</description><subject>action recognition</subject><subject>Classification</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Multimodal analysis</subject><subject>Norms</subject><subject>Regularization</subject><subject>RGB+D</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Skeleton</subject><subject>State of the art</subject><subject>structured sparsity</subject><subject>Three-dimensional displays</subject><subject>Videos</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkU1LAzEQhoMoWqt_QEEWvAiyNZPs5uNYW62Cooh6DWl2ViLbpm52D_57o60ePGUyeeaFeULIEdARANUXz4_j-9sRoyBHTGjgDLbIADTXOS-53iYDCoLlSjG1R_ZjfKcUipLyXbLHFNcASg_IbIq4yu77pvOLUNkmu0bb9S1m46VtPqOPWR3abOw6H5bZE7rwtvQ_tU_X2eX5NHv1FYZ4QHZq20Q83JxD8nJ99Ty5ye8eZreT8V3umBBd7pBqVjDBKi4llSUTkqVeDQ7LUtCa0kooK-epttxRV1RSz5VVpZAFq5TjQ3K2zl214aPH2JmFjw6bxi4x9NGAUoIXogCa0NN_6Hvo27RWNAxkMqGhVIlia8q1IcYWa7Nq_cK2nwao-dZsfjSbb81mozkNnWyi-_kCq7-RX68JOF4DHhH_nqXSTKcf-AIuc34o</recordid><startdate>20180501</startdate><enddate>20180501</enddate><creator>Shahroudy, Amir</creator><creator>Ng, Tian-Tsong</creator><creator>Gong, Yihong</creator><creator>Wang, Gang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></search><sort><creationdate>20180501</creationdate><title>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</title><author>Shahroudy, Amir ; Ng, Tian-Tsong ; Gong, Yihong ; Wang, Gang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c266t-ce0924262d3770752672ce0f1ce5560f00d68a7b560a3c0c4d79b8a856742d8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>action recognition</topic><topic>Classification</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Multimodal analysis</topic><topic>Norms</topic><topic>Regularization</topic><topic>RGB+D</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Skeleton</topic><topic>State of the art</topic><topic>structured sparsity</topic><topic>Three-dimensional displays</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shahroudy, Amir</creatorcontrib><creatorcontrib>Ng, Tian-Tsong</creatorcontrib><creatorcontrib>Gong, Yihong</creatorcontrib><creatorcontrib>Wang, Gang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shahroudy, Amir</au><au>Ng, Tian-Tsong</au><au>Gong, Yihong</au><au>Wang, Gang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-05-01</date><risdate>2018</risdate><volume>40</volume><issue>5</issue><spage>1045</spage><epage>1058</epage><pages>1045-1058</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28391189</pmid><doi>10.1109/TPAMI.2017.2691321</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1045-6437</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2018-05, Vol.40 (5), p.1045-1058 |
issn | 0162-8828 1939-3539 2160-9292 |
language | eng |
recordid | cdi_proquest_miscellaneous_1886346410 |
source | IEEE Electronic Library (IEL) |
subjects | action recognition Classification Correlation Feature extraction Feature recognition Multimodal analysis Norms Regularization RGB+D Robustness Sensors Skeleton State of the art structured sparsity Three-dimensional displays Videos |
title | Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A52%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Multimodal%20Feature%20Analysis%20for%20Action%20Recognition%20in%20RGB+D%20Videos&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Shahroudy,%20Amir&rft.date=2018-05-01&rft.volume=40&rft.issue=5&rft.spage=1045&rft.epage=1058&rft.pages=1045-1058&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2691321&rft_dat=%3Cproquest_RIE%3E1886346410%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174509158&rft_id=info:pmid/28391189&rft_ieee_id=7892950&rfr_iscdi=true |