Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures

Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classificat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.173566-173575
Hauptverfasser: Joefrie, Yuri Yudhaswana, Aono, Masaki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 173575
container_issue
container_start_page 173566
container_title IEEE access
container_volume 8
creator Joefrie, Yuri Yudhaswana
Aono, Masaki
description Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.
doi_str_mv 10.1109/ACCESS.2020.3025931
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2454679143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9203837</ieee_id><doaj_id>oai_doaj_org_article_6cef1e3072d84e7a803095cdc00b79b8</doaj_id><sourcerecordid>2454679143</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</originalsourceid><addsrcrecordid>eNpNUU1r3DAUNKWBhjS_IBdBz94-fdiSjls3TQMbCtmUHoUsP6VanJUr2dD8-ypxWPoubxjNzBNMVV1R2FAK-vO26673-w0DBhsOrNGcvqvOGW11zRvevv8Pf6gucz5AGVWoRp5X8W4Z51DvbI8jWXE32pzJ1s0hHsk9uvh4DK_4V5h_k6-IE9lPtjD1Az5NMdmR7Owzpky-2IwDKcrTw41dcg72SO7C33lJmD9WZ96OGS_f9kX189v1Q_e93v24ue22u9oJUHONLQPNPVrBdDMwKb3uG4stei2EF5QLT1sJynngrdVFCL3ngitQ_dArzS-q2zV3iPZgphSebHo20QbzSsT0aGyagxvRtA49RQ6SDUqgtAo46MYNDqCXulcl69OaNaX4Z8E8m0Nc0rF83zDRiFZqKnhR8VXlUsw5oT9dpWBeijJrUealKPNWVHFdra6AiCeHZsAVl_wfCBKOzQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454679143</pqid></control><display><type>article</type><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Joefrie, Yuri Yudhaswana ; Aono, Masaki</creator><creatorcontrib>Joefrie, Yuri Yudhaswana ; Aono, Masaki</creatorcontrib><description>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3025931</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Action recognition ; Computer architecture ; Convolution ; Kernel ; motion detection ; multi-branch network ; multi-layer neural network ; Neural networks ; Optical imaging ; Recognition ; spatio-temporal branch ; Three-dimensional displays ; Two dimensional displays ; videos</subject><ispartof>IEEE access, 2020, Vol.8, p.173566-173575</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</citedby><cites>FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</cites><orcidid>0000-0002-2618-4667</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9203837$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Joefrie, Yuri Yudhaswana</creatorcontrib><creatorcontrib>Aono, Masaki</creatorcontrib><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><title>IEEE access</title><addtitle>Access</addtitle><description>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</description><subject>Action recognition</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Kernel</subject><subject>motion detection</subject><subject>multi-branch network</subject><subject>multi-layer neural network</subject><subject>Neural networks</subject><subject>Optical imaging</subject><subject>Recognition</subject><subject>spatio-temporal branch</subject><subject>Three-dimensional displays</subject><subject>Two dimensional displays</subject><subject>videos</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1r3DAUNKWBhjS_IBdBz94-fdiSjls3TQMbCtmUHoUsP6VanJUr2dD8-ypxWPoubxjNzBNMVV1R2FAK-vO26673-w0DBhsOrNGcvqvOGW11zRvevv8Pf6gucz5AGVWoRp5X8W4Z51DvbI8jWXE32pzJ1s0hHsk9uvh4DK_4V5h_k6-IE9lPtjD1Az5NMdmR7Owzpky-2IwDKcrTw41dcg72SO7C33lJmD9WZ96OGS_f9kX189v1Q_e93v24ue22u9oJUHONLQPNPVrBdDMwKb3uG4stei2EF5QLT1sJynngrdVFCL3ngitQ_dArzS-q2zV3iPZgphSebHo20QbzSsT0aGyagxvRtA49RQ6SDUqgtAo46MYNDqCXulcl69OaNaX4Z8E8m0Nc0rF83zDRiFZqKnhR8VXlUsw5oT9dpWBeijJrUealKPNWVHFdra6AiCeHZsAVl_wfCBKOzQ</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Joefrie, Yuri Yudhaswana</creator><creator>Aono, Masaki</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2618-4667</orcidid></search><sort><creationdate>2020</creationdate><title>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</title><author>Joefrie, Yuri Yudhaswana ; Aono, Masaki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e62093fea4295d277f9b5ae6ef944f4134f16708cf036a9fea0bf343808bdb893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action recognition</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Kernel</topic><topic>motion detection</topic><topic>multi-branch network</topic><topic>multi-layer neural network</topic><topic>Neural networks</topic><topic>Optical imaging</topic><topic>Recognition</topic><topic>spatio-temporal branch</topic><topic>Three-dimensional displays</topic><topic>Two dimensional displays</topic><topic>videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Joefrie, Yuri Yudhaswana</creatorcontrib><creatorcontrib>Aono, Masaki</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Joefrie, Yuri Yudhaswana</au><au>Aono, Masaki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>173566</spage><epage>173575</epage><pages>173566-173575</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3025931</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-2618-4667</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020, Vol.8, p.173566-173575
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2454679143
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Action recognition
Computer architecture
Convolution
Kernel
motion detection
multi-branch network
multi-layer neural network
Neural networks
Optical imaging
Recognition
spatio-temporal branch
Three-dimensional displays
Two dimensional displays
videos
title Multi-Label Multi-Class Action Recognition With Deep Spatio-Temporal Layers Based on Temporal Gaussian Mixtures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T19%3A05%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Label%20Multi-Class%20Action%20Recognition%20With%20Deep%20Spatio-Temporal%20Layers%20Based%20on%20Temporal%20Gaussian%20Mixtures&rft.jtitle=IEEE%20access&rft.au=Joefrie,%20Yuri%20Yudhaswana&rft.date=2020&rft.volume=8&rft.spage=173566&rft.epage=173575&rft.pages=173566-173575&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3025931&rft_dat=%3Cproquest_ieee_%3E2454679143%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454679143&rft_id=info:pmid/&rft_ieee_id=9203837&rft_doaj_id=oai_doaj_org_article_6cef1e3072d84e7a803095cdc00b79b8&rfr_iscdi=true