Combining 2D and 3D deep models for action recognition with depth information

In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2018-09, Vol.12 (6), p.1197-1205
Hauptverfasser: Keçeli, Ali Seydi, Kaya, Aydın, Can, Ahmet Burak
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1205
container_issue 6
container_start_page 1197
container_title Signal, image and video processing
container_volume 12
creator Keçeli, Ali Seydi
Kaya, Aydın
Can, Ahmet Burak
description In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.
doi_str_mv 10.1007/s11760-018-1271-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2087123124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2087123124</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2087123124</pqid></control><display><type>article</type><title>Combining 2D and 3D deep models for action recognition with depth information</title><source>Springer Nature - Complete Springer Journals</source><creator>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creator><creatorcontrib>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</creatorcontrib><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-018-1271-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Activity recognition ; Computer Imaging ; Computer Science ; Datasets ; Feature extraction ; Image Processing and Computer Vision ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Representations ; Signal,Image and Speech Processing ; Three dimensional models ; Two dimensional models ; Vision</subject><ispartof>Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205</ispartof><rights>Springer-Verlag London Ltd., part of Springer Nature 2018</rights><rights>Copyright Springer Science &amp; Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</citedby><cites>FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-018-1271-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-018-1271-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,41469,42538,51300</link.rule.ids></links><search><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><title>Combining 2D and 3D deep models for action recognition with depth information</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</description><subject>Activity recognition</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LAzEQxYMoWLQfwFvAczSTtJv0KK3_oOJFzyHZJDWlm6zJFvHbm7qiJ-cwMwy_9wYeQhdAr4BScV0AREMJBUmACSD8CE1ANpyAADj-3Sk_RdNStrQWZ0I2coKelqkzIYa4wWyFdbSYr7B1rsddsm5XsE8Z63YIKeLs2rSJ4Xv_CMNb5fraQ6xMpw_nc3Ti9a646c88Q693ty_LB7J-vn9c3qxJy6EZiLWG1e_Me0edtHY-bwTXRhrKFtR62bq5odQ6IU1FXMMXM288tDAzwgot-Rm6HH37nN73rgxqm_Y51peKUSmAcWCzSsFItTmVkp1XfQ6dzp8KqDoEp8bgVA1OHYJTvGrYqCmVjRuX_5z_F30BEO9wSA</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Keçeli, Ali Seydi</creator><creator>Kaya, Aydın</creator><creator>Can, Ahmet Burak</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180901</creationdate><title>Combining 2D and 3D deep models for action recognition with depth information</title><author>Keçeli, Ali Seydi ; Kaya, Aydın ; Can, Ahmet Burak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-ddb28682ffe0e8dd55673ab8b0290df8ce5b00de78bffee6394fbf1c14b7d7a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Activity recognition</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Keçeli, Ali Seydi</creatorcontrib><creatorcontrib>Kaya, Aydın</creatorcontrib><creatorcontrib>Can, Ahmet Burak</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Keçeli, Ali Seydi</au><au>Kaya, Aydın</au><au>Can, Ahmet Burak</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining 2D and 3D deep models for action recognition with depth information</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>12</volume><issue>6</issue><spage>1197</spage><epage>1205</epage><pages>1197-1205</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>In activity recognition, usage of depth data is a rapidly growing research area. This paper presents a method for recognizing single-person activities and dyadic interactions by using deep features extracted from both 3D and 2D representations, which are constructed from depth sequences. First, a 3D volume representation is generated by considering spatiotemporal information in depth frames of an action sequence. Then, a 3D-CNN is trained to learn features from these 3D volume representations. In addition to this, a 2D representation is constructed from the weighted sum of the depth sequences. This 2D representation is used with a pre-trained CNN model. Features learned from this model and the 3D-CNN model are used in training of the final approach after a feature selection step. Among the various classifiers, an SVM-based model produced the best results. The proposed method was tested on the MSR-Action3D dataset for single-person activities, the SBU dataset for dyadic interactions, and the NTU RGB+D dataset for both types of actions. Experimental results show that proposed 3D and 2D representations and deep features extracted from them are robust and efficient. The proposed method achieves comparable results with the state of the art methods in the literature.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-018-1271-3</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2018-09, Vol.12 (6), p.1197-1205
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_2087123124
source Springer Nature - Complete Springer Journals
subjects Activity recognition
Computer Imaging
Computer Science
Datasets
Feature extraction
Image Processing and Computer Vision
Multimedia Information Systems
Original Paper
Pattern Recognition and Graphics
Representations
Signal,Image and Speech Processing
Three dimensional models
Two dimensional models
Vision
title Combining 2D and 3D deep models for action recognition with depth information
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T09%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%202D%20and%203D%20deep%20models%20for%20action%20recognition%20with%20depth%20information&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Ke%C3%A7eli,%20Ali%20Seydi&rft.date=2018-09-01&rft.volume=12&rft.issue=6&rft.spage=1197&rft.epage=1205&rft.pages=1197-1205&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-018-1271-3&rft_dat=%3Cproquest_cross%3E2087123124%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2087123124&rft_id=info:pmid/&rfr_iscdi=true