Video spatiotemporal mapping for human action recognition by convolutional neural network

In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern analysis and applications : PAA 2020-02, Vol.23 (1), p.265-279
Hauptverfasser:	Zare, Amin, Abrishami Moghaddam, Hamid, Sharifi, Arash
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Computer Science Human activity recognition Human motion Machine learning Mapping Neural networks Pattern Recognition Representations Subtraction Theoretical Advances Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	279
container_issue	1
container_start_page	265
container_title	Pattern analysis and applications : PAA
container_volume	23
creator	Zare, Amin Abrishami Moghaddam, Hamid Sharifi, Arash
description	In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames. The feature vector corresponding to each frame is generated by applying wavelet transform to that frame (or its subtraction from the subsequent frame) and computing vertical and horizontal projection of quantized coefficients of some specific wavelet subbands. VSTM enables convolutional neural networks (CNNs) to process a video clip for human action recognition (HAR). The proposed approach benefits from power of CNNs to analyze visual patterns and attempts to overcome some CNN challenges such as variable video length problem and lack of training data that leads to over-fitting. VSTM presents a sequence of frames to CNN without imposing any additional computational cost to the CNN learning algorithm. The experimental results of the proposed method on the KTH, Weizmann, and UCF Sports HAR benchmark datasets have shown the supremacy of the proposed method compared with the state-of-the-art methods that used CNN to solve HAR problem.
doi_str_mv	10.1007/s10044-019-00788-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2352079321</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2352079321</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-67a557875269c50d78cbd6f13135c9aa89a5b9057a6f3df4d7e847752bba3d8f3</originalsourceid><addsrcrecordid>eNp9UE1PxCAUJEYT19U_4InEMwqlFDga41eyiRc1eiKUwtp1CxVazf572a3Rm5c3b_JmJi8DwCnB5wRjfpHyLEuEiUSZCoHIHpiRklLEGXvZ_91LcgiOUlphTCktxAy8PreNDTD1emjDYLs-RL2Gne771i-hCxG-jZ32UJt89zBaE5a-3e31BprgP8N63NLs8naMOxi-Qnw_BgdOr5M9-cE5eLq5fry6Q4uH2_urywUylMgBVVwzxgVnRSUNww0Xpm4qRyihzEithdSslphxXTnauLLhVpQ8y-ta00Y4OgdnU24fw8do06BWYYz5n6QKygrMJS1IVhWTysSQUrRO9bHtdNwogtW2QjVVqHKFaleh2proZEpZ7Jc2_kX_4_oGUOF1iw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352079321</pqid></control><display><type>article</type><title>Video spatiotemporal mapping for human action recognition by convolutional neural network</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zare, Amin ; Abrishami Moghaddam, Hamid ; Sharifi, Arash</creator><creatorcontrib>Zare, Amin ; Abrishami Moghaddam, Hamid ; Sharifi, Arash</creatorcontrib><description>In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames. The feature vector corresponding to each frame is generated by applying wavelet transform to that frame (or its subtraction from the subsequent frame) and computing vertical and horizontal projection of quantized coefficients of some specific wavelet subbands. VSTM enables convolutional neural networks (CNNs) to process a video clip for human action recognition (HAR). The proposed approach benefits from power of CNNs to analyze visual patterns and attempts to overcome some CNN challenges such as variable video length problem and lack of training data that leads to over-fitting. VSTM presents a sequence of frames to CNN without imposing any additional computational cost to the CNN learning algorithm. The experimental results of the proposed method on the KTH, Weizmann, and UCF Sports HAR benchmark datasets have shown the supremacy of the proposed method compared with the state-of-the-art methods that used CNN to solve HAR problem.</description><identifier>ISSN: 1433-7541</identifier><identifier>EISSN: 1433-755X</identifier><identifier>DOI: 10.1007/s10044-019-00788-1</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial neural networks ; Computer Science ; Human activity recognition ; Human motion ; Machine learning ; Mapping ; Neural networks ; Pattern Recognition ; Representations ; Subtraction ; Theoretical Advances ; Wavelet transforms</subject><ispartof>Pattern analysis and applications : PAA, 2020-02, Vol.23 (1), p.265-279</ispartof><rights>Springer-Verlag London Ltd., part of Springer Nature 2019</rights><rights>2019© Springer-Verlag London Ltd., part of Springer Nature 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-67a557875269c50d78cbd6f13135c9aa89a5b9057a6f3df4d7e847752bba3d8f3</citedby><cites>FETCH-LOGICAL-c319t-67a557875269c50d78cbd6f13135c9aa89a5b9057a6f3df4d7e847752bba3d8f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10044-019-00788-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10044-019-00788-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,41469,42538,51300</link.rule.ids></links><search><creatorcontrib>Zare, Amin</creatorcontrib><creatorcontrib>Abrishami Moghaddam, Hamid</creatorcontrib><creatorcontrib>Sharifi, Arash</creatorcontrib><title>Video spatiotemporal mapping for human action recognition by convolutional neural network</title><title>Pattern analysis and applications : PAA</title><addtitle>Pattern Anal Applic</addtitle><description>In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames. The feature vector corresponding to each frame is generated by applying wavelet transform to that frame (or its subtraction from the subsequent frame) and computing vertical and horizontal projection of quantized coefficients of some specific wavelet subbands. VSTM enables convolutional neural networks (CNNs) to process a video clip for human action recognition (HAR). The proposed approach benefits from power of CNNs to analyze visual patterns and attempts to overcome some CNN challenges such as variable video length problem and lack of training data that leads to over-fitting. VSTM presents a sequence of frames to CNN without imposing any additional computational cost to the CNN learning algorithm. The experimental results of the proposed method on the KTH, Weizmann, and UCF Sports HAR benchmark datasets have shown the supremacy of the proposed method compared with the state-of-the-art methods that used CNN to solve HAR problem.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Computer Science</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Neural networks</subject><subject>Pattern Recognition</subject><subject>Representations</subject><subject>Subtraction</subject><subject>Theoretical Advances</subject><subject>Wavelet transforms</subject><issn>1433-7541</issn><issn>1433-755X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9UE1PxCAUJEYT19U_4InEMwqlFDga41eyiRc1eiKUwtp1CxVazf572a3Rm5c3b_JmJi8DwCnB5wRjfpHyLEuEiUSZCoHIHpiRklLEGXvZ_91LcgiOUlphTCktxAy8PreNDTD1emjDYLs-RL2Gne771i-hCxG-jZ32UJt89zBaE5a-3e31BprgP8N63NLs8naMOxi-Qnw_BgdOr5M9-cE5eLq5fry6Q4uH2_urywUylMgBVVwzxgVnRSUNww0Xpm4qRyihzEithdSslphxXTnauLLhVpQ8y-ta00Y4OgdnU24fw8do06BWYYz5n6QKygrMJS1IVhWTysSQUrRO9bHtdNwogtW2QjVVqHKFaleh2proZEpZ7Jc2_kX_4_oGUOF1iw</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Zare, Amin</creator><creator>Abrishami Moghaddam, Hamid</creator><creator>Sharifi, Arash</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20200201</creationdate><title>Video spatiotemporal mapping for human action recognition by convolutional neural network</title><author>Zare, Amin ; Abrishami Moghaddam, Hamid ; Sharifi, Arash</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-67a557875269c50d78cbd6f13135c9aa89a5b9057a6f3df4d7e847752bba3d8f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Computer Science</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Neural networks</topic><topic>Pattern Recognition</topic><topic>Representations</topic><topic>Subtraction</topic><topic>Theoretical Advances</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zare, Amin</creatorcontrib><creatorcontrib>Abrishami Moghaddam, Hamid</creatorcontrib><creatorcontrib>Sharifi, Arash</creatorcontrib><collection>CrossRef</collection><jtitle>Pattern analysis and applications : PAA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zare, Amin</au><au>Abrishami Moghaddam, Hamid</au><au>Sharifi, Arash</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Video spatiotemporal mapping for human action recognition by convolutional neural network</atitle><jtitle>Pattern analysis and applications : PAA</jtitle><stitle>Pattern Anal Applic</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>23</volume><issue>1</issue><spage>265</spage><epage>279</epage><pages>265-279</pages><issn>1433-7541</issn><eissn>1433-755X</eissn><abstract>In this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical concatenation of feature vectors generated from subsequent frames. The feature vector corresponding to each frame is generated by applying wavelet transform to that frame (or its subtraction from the subsequent frame) and computing vertical and horizontal projection of quantized coefficients of some specific wavelet subbands. VSTM enables convolutional neural networks (CNNs) to process a video clip for human action recognition (HAR). The proposed approach benefits from power of CNNs to analyze visual patterns and attempts to overcome some CNN challenges such as variable video length problem and lack of training data that leads to over-fitting. VSTM presents a sequence of frames to CNN without imposing any additional computational cost to the CNN learning algorithm. The experimental results of the proposed method on the KTH, Weizmann, and UCF Sports HAR benchmark datasets have shown the supremacy of the proposed method compared with the state-of-the-art methods that used CNN to solve HAR problem.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10044-019-00788-1</doi><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1433-7541
ispartof	Pattern analysis and applications : PAA, 2020-02, Vol.23 (1), p.265-279
issn	1433-7541 1433-755X
language	eng
recordid	cdi_proquest_journals_2352079321
source	SpringerLink Journals - AutoHoldings
subjects	Algorithms Artificial neural networks Computer Science Human activity recognition Human motion Machine learning Mapping Neural networks Pattern Recognition Representations Subtraction Theoretical Advances Wavelet transforms
title	Video spatiotemporal mapping for human action recognition by convolutional neural network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T23%3A21%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Video%20spatiotemporal%20mapping%20for%20human%20action%20recognition%20by%20convolutional%20neural%20network&rft.jtitle=Pattern%20analysis%20and%20applications%20:%20PAA&rft.au=Zare,%20Amin&rft.date=2020-02-01&rft.volume=23&rft.issue=1&rft.spage=265&rft.epage=279&rft.pages=265-279&rft.issn=1433-7541&rft.eissn=1433-755X&rft_id=info:doi/10.1007/s10044-019-00788-1&rft_dat=%3Cproquest_cross%3E2352079321%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2352079321&rft_id=info:pmid/&rfr_iscdi=true