Combining 2D and 3D deep models for action recognition with depth information
In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2018-09, Vol.12 (6), p.1197-1205 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1205 |
---|---|
container_issue | 6 |
container_start_page | 1197 |
container_title | Signal, image and video processing |
container_volume | 12 |
creator | Keçeli, Ali Seydi Kaya, Aydın Can, Ahmet Burak |
description | In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature. |
doi_str_mv | 10.1007/s11760-018-1271-3 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2087123124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2087123124</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2087123124</pqid></control><display><type>article</type><title>Combining 2D and 3D deep models for action recognition with depth information</title><source>Springer Nature - Complete Springer Journals</source><creator>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creator><creatorcontrib>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creatorcontrib><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-018-1271-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Activity recognition ; Computer Imaging ; Computer Science ; Datasets ; Feature extraction ; Image Processing and Computer Vision ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Representations ; Signal,Image and Speech Processing ; Three dimensional models ; Two dimensional models ; Vision</subject><ispartof>Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205</ispartof><rights>Springer-Verlag London Ltd., part of Springer Nature 2018</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</citedby><cites>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-018-1271-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-018-1271-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,41469,42538,51300</link.rule.ids></links><search><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><title>Combining 2D and 3D deep models for action recognition with depth information</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><subject>Activity recognition</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Keçeli, Ali Seydi</creator><creator>Kaya, Aydın</creator><creator>Can, Ahmet Burak</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180901</creationdate><title>Combining 2D and 3D deep models for action recognition with depth information</title><author>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Activity recognition</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Keçeli, Ali Seydi</au><au>Kaya, Aydın</au><au>Can, Ahmet Burak</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining 2D and 3D deep models for action recognition with depth information</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>12</volume><issue>6</issue><spage>1197</spage><epage>1205</epage><pages>1197-1205</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-018-1271-3</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1863-1703 |
ispartof | Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205 |
issn | 1863-1703 1863-1711 |
language | eng |
recordid | cdi_proquest_journals_2087123124 |
source | Springer Nature - Complete Springer Journals |
subjects | Activity recognition Computer Imaging Computer Science Datasets Feature extraction Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern Recognition and Graphics Representations Signal,Image and Speech Processing Three dimensional models Two dimensional models Vision |
title | Combining 2D and 3D deep models for action recognition with depth information |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T09%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%202D%20and%203D%20deep%20models%20for%20action%20recognition%20with%20depth%20information&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Ke%C3%A7eli,%20Ali%20Seydi&rft.date=2018-09-01&rft.volume=12&rft.issue=6&rft.spage=1197&rft.epage=1205&rft.pages=1197-1205&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-018-1271-3&rft_dat=%3Cproquest_cross%3E2087123124%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2087123124&rft_id=info:pmid/&rfr_iscdi=true |