Combining 2D and 3D deep models for action recognition with depth information

In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2018-09, Vol.12 (6), p.1197-1205
Hauptverfasser:	Keçeli, Ali Seydi, Kaya, Aydın, Can, Ahmet Burak
Format:	Artikel
Sprache:	eng
Schlagworte:	Activity recognition Computer Imaging Computer Science Datasets Feature extraction Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern Recognition and Graphics Representations Signal,Image and Speech Processing Three dimensional models Two dimensional models Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1205
container_issue	6
container_start_page	1197
container_title	Signal, image and video processing
container_volume	12
creator	Keçeli, Ali Seydi Kaya, Aydın Can, Ahmet Burak
description	In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.
doi_str_mv	10.1007/s11760-018-1271-3
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2087123124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2087123124</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2087123124</pqid></control><display><type>article</type><title>Combining 2D and 3D deep models for action recognition with depth information</title><source>Springer Nature - Complete Springer Journals</source><creator>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creator><creatorcontrib>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creatorcontrib><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-018-1271-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Activity recognition ; Computer Imaging ; Computer Science ; Datasets ; Feature extraction ; Image Processing and Computer Vision ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Representations ; Signal,Image and Speech Processing ; Three dimensional models ; Two dimensional models ; Vision</subject><ispartof>Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205</ispartof><rights>Springer-Verlag London Ltd., part of Springer Nature 2018</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</citedby><cites>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-018-1271-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-018-1271-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,41469,42538,51300</link.rule.ids></links><search><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><title>Combining 2D and 3D deep models for action recognition with depth information</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><subject>Activity recognition</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Keçeli, Ali Seydi</creator><creator>Kaya, Aydın</creator><creator>Can, Ahmet Burak</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180901</creationdate><title>Combining 2D and 3D deep models for action recognition with depth information</title><author>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Activity recognition</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Keçeli, Ali Seydi</au><au>Kaya, Aydın</au><au>Can, Ahmet Burak</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining 2D and 3D deep models for action recognition with depth information</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>12</volume><issue>6</issue><spage>1197</spage><epage>1205</epage><pages>1197-1205</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-018-1271-3</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1863-1703
ispartof	Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205
issn	1863-1703 1863-1711
language	eng
recordid	cdi_proquest_journals_2087123124
source	Springer Nature - Complete Springer Journals
subjects	Activity recognition Computer Imaging Computer Science Datasets Feature extraction Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern Recognition and Graphics Representations Signal,Image and Speech Processing Three dimensional models Two dimensional models Vision
title	Combining 2D and 3D deep models for action recognition with depth information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T09%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%202D%20and%203D%20deep%20models%20for%20action%20recognition%20with%20depth%20information&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Ke%C3%A7eli,%20Ali%20Seydi&rft.date=2018-09-01&rft.volume=12&rft.issue=6&rft.spage=1197&rft.epage=1205&rft.pages=1197-1205&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-018-1271-3&rft_dat=%3Cproquest_cross%3E2087123124%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2087123124&rft_id=info:pmid/&rfr_iscdi=true