Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures

Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classificat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020, Vol.8, p.173566-173575
Hauptverfasser:	Joefrie, Yuri Yudhaswana, Aono, Masaki
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Computer architecture Convolution Kernel motion detection multi-branch network multi-layer neural network Neural networks Optical imaging Recognition spatio-temporal branch Three-dimensional displays Two dimensional displays videos
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	173575
container_issue
container_start_page	173566
container_title	IEEE access
container_volume	8
creator	Joefrie, Yuri Yudhaswana Aono, Masaki
description	Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.
doi_str_mv	10.1109/ACCESS.2020.3025931
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2454679143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9203837</ieee_id><doaj_id>oai_doaj_org_article_6cef1e3072d84e7a803095cdc00b79b8</doaj_id><sourcerecordid>2454679143</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</originalsourceid><addsrcrecordid>eNpNUU1r3DAUNKWBhjS_IBdBz94-fdiSjls3TQMbCtmUHoUsP6VanJUr2dD8-ypxWPoubxjNzBNMVV1R2FAK-vO26673-w0DBhsOrNGcvqvOGW11zRvevv8Pf6gucz5AGVWoRp5X8W4Z51DvbI8jWXE32pzJ1s0hHsk9uvh4DK_4V5h_k6-IE9lPtjD1Az5NMdmR7Owzpky-2IwDKcrTw41dcg72SO7C33lJmD9WZ96OGS_f9kX189v1Q_e93v24ue22u9oJUHONLQPNPVrBdDMwKb3uG4stei2EF5QLT1sJynngrdVFCL3ngitQ_dArzS-q2zV3iPZgphSebHo20QbzSsT0aGyagxvRtA49RQ6SDUqgtAo46MYNDqCXulcl69OaNaX4Z8E8m0Nc0rF83zDRiFZqKnhR8VXlUsw5oT9dpWBeijJrUealKPNWVHFdra6AiCeHZsAVl_wfCBKOzQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454679143</pqid></control><display><type>article</type><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Joefrie, Yuri Yudhaswana ; Aono, Masaki</creator><creatorcontrib>Joefrie, Yuri Yudhaswana ; Aono, Masaki</creatorcontrib><description>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3025931</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Action recognition ; Computer architecture ; Convolution ; Kernel ; motion detection ; multi-branch network ; multi-layer neural network ; Neural networks ; Optical imaging ; Recognition ; spatio-temporal branch ; Three-dimensional displays ; Two dimensional displays ; videos</subject><ispartof>IEEE access, 2020, Vol.8, p.173566-173575</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</citedby><cites>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</cites><orcidid>0000-0002-2618-4667</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9203837$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Joefrie, Yuri Yudhaswana</creatorcontrib><creatorcontrib>Aono, Masaki</creatorcontrib><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><title>IEEE access</title><addtitle>Access</addtitle><description>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</description><subject>Action recognition</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Kernel</subject><subject>motion detection</subject><subject>multi-branch network</subject><subject>multi-layer neural network</subject><subject>Neural networks</subject><subject>Optical imaging</subject><subject>Recognition</subject><subject>spatio-temporal branch</subject><subject>Three-dimensional displays</subject><subject>Two dimensional displays</subject><subject>videos</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1r3DAUNKWBhjS_IBdBz94-fdiSjls3TQMbCtmUHoUsP6VanJUr2dD8-ypxWPoubxjNzBNMVV1R2FAK-vO26673-w0DBhsOrNGcvqvOGW11zRvevv8Pf6gucz5AGVWoRp5X8W4Z51DvbI8jWXE32pzJ1s0hHsk9uvh4DK_4V5h_k6-IE9lPtjD1Az5NMdmR7Owzpky-2IwDKcrTw41dcg72SO7C33lJmD9WZ96OGS_f9kX189v1Q_e93v24ue22u9oJUHONLQPNPVrBdDMwKb3uG4stei2EF5QLT1sJynngrdVFCL3ngitQ_dArzS-q2zV3iPZgphSebHo20QbzSsT0aGyagxvRtA49RQ6SDUqgtAo46MYNDqCXulcl69OaNaX4Z8E8m0Nc0rF83zDRiFZqKnhR8VXlUsw5oT9dpWBeijJrUealKPNWVHFdra6AiCeHZsAVl_wfCBKOzQ</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Joefrie, Yuri Yudhaswana</creator><creator>Aono, Masaki</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2618-4667</orcidid></search><sort><creationdate>2020</creationdate><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><author>Joefrie, Yuri Yudhaswana ; Aono, Masaki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Kernel</topic><topic>motion detection</topic><topic>multi-branch network</topic><topic>multi-layer neural network</topic><topic>Neural networks</topic><topic>Optical imaging</topic><topic>Recognition</topic><topic>spatio-temporal branch</topic><topic>Three-dimensional displays</topic><topic>Two dimensional displays</topic><topic>videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Joefrie, Yuri Yudhaswana</creatorcontrib><creatorcontrib>Aono, Masaki</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Joefrie, Yuri Yudhaswana</au><au>Aono, Masaki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>173566</spage><epage>173575</epage><pages>173566-173575</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3025931</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-2618-4667</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2020, Vol.8, p.173566-173575
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2454679143
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Action recognition Computer architecture Convolution Kernel motion detection multi-branch network multi-layer neural network Neural networks Optical imaging Recognition spatio-temporal branch Three-dimensional displays Two dimensional displays videos
title	Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T19%3A05%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Label%20Multi-Class%20Action%20Recognition%20With%20Deep%20Spatio-Temporal%20Layers%20Based%20on%20Temporal%20Gaussian%20Mixtures&rft.jtitle=IEEE%20access&rft.au=Joefrie,%20Yuri%20Yudhaswana&rft.date=2020&rft.volume=8&rft.spage=173566&rft.epage=173575&rft.pages=173566-173575&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3025931&rft_dat=%3Cproquest_ieee_%3E2454679143%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454679143&rft_id=info:pmid/&rft_ieee_id=9203837&rft_doaj_id=oai_doaj_org_article_6cef1e3072d84e7a803095cdc00b79b8&rfr_iscdi=true