Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection
Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inabil...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2022-05, Vol.32 (5), p.2962-2975 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2975 |
---|---|
container_issue | 5 |
container_start_page | 2962 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 32 |
creator | Chen, Yaosen Guo, Bing Shen, Yan Wang, Wei Lu, Weichen Suo, Xinhua |
description | Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inability from convolutional neural networks, which can better understand the temporal relations for temporal action detection. However, because of the extremely computationally expensive procedure, capsule network is difficult to be applied to the task of temporal action detection. To address this issue, this paper proposes a novel U-shaped capsule network framework with a k-Nearest Neighbor (k-NN) mechanism of 3D convolutional dynamic routing, which we named U-BlockConvCaps. Furthermore, we build a Capsules Boundary Network (CapsBoundNet) based on U-BlockConvCaps for dense temporal action proposal generation. Specifically, the first module is one 1D convolutional layer for fusing the two-stream with RGB and optical flow video features. The sampling module further processes the fused features to generate the 2D start-end action proposal feature maps. Then, the multi-scale U-Block convolutional capsule module with 3D convolutional dynamic routing is used to process the proposal feature map. Finally, the feature maps generated from the CapsBoundNet are used to predict starting, ending, action classification, and action regression score maps, which help to capture the boundary and intersection over union features. Our work innovatively improves the dynamic routing algorithm of capsule networks and extends the use of capsule networks to the temporal action detection task for the first time in the literature. The experimental results on benchmarks THUMOS14 show that the performance of CapsBoundNet is obviously beyond the state-of-the-art methods, e.g., the mAP@tIoU = 0.3, 0.4, 0.5 on THUMOS14 are improved from 63.6% to 70.0%, 57.8% to 63.1%, 51.3% to 52.9%, respectively. We also got competitive results on the action detection dataset of ActivityNet1.3. |
doi_str_mv | 10.1109/TCSVT.2021.3104226 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCSVT_2021_3104226</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9512048</ieee_id><sourcerecordid>2659345409</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-78a9bbe231c13f94d4909ffec7230bfe14ae9b16876e4f6109d1c9dcfaeb312c3</originalsourceid><addsrcrecordid>eNo9kFtLw0AQhYMoWKt_QF8WfE7d2UuSfaypNygKGvVFCMlmVlPTbN1NlP57U1t8msPMOcPMFwSnQCcAVF1k6dNLNmGUwYQDFYxFe8EIpExCxqjcHzSVECYM5GFw5P2CUhCJiEfBW1qsfN8gubR9WxVuTe6x-7Huk7zW3QfhM5La9ts2fVfbtmjIbN0Wy1qTRzt02ndirCMZLlfWDcOp3rjIDDv8U8fBgSkajye7Og6er6-y9DacP9zcpdN5qJmSXRgnhSpLZBw0cKNEJRRVxqCOGaelQRAFqhKiJI5QmGj4twKtKm0KLDkwzcfB-XbvytmvHn2XL2zvhnN9ziKpuJCCqsHFti7trPcOTb5y9XJ4OQeabyjmfxTzDcV8R3EInW1DNSL-B5QERkXCfwE4bm8X</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2659345409</pqid></control><display><type>article</type><title>Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Chen, Yaosen ; Guo, Bing ; Shen, Yan ; Wang, Wei ; Lu, Weichen ; Suo, Xinhua</creator><creatorcontrib>Chen, Yaosen ; Guo, Bing ; Shen, Yan ; Wang, Wei ; Lu, Weichen ; Suo, Xinhua</creatorcontrib><description>Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inability from convolutional neural networks, which can better understand the temporal relations for temporal action detection. However, because of the extremely computationally expensive procedure, capsule network is difficult to be applied to the task of temporal action detection. To address this issue, this paper proposes a novel U-shaped capsule network framework with a k-Nearest Neighbor (k-NN) mechanism of 3D convolutional dynamic routing, which we named U-BlockConvCaps. Furthermore, we build a Capsules Boundary Network (CapsBoundNet) based on U-BlockConvCaps for dense temporal action proposal generation. Specifically, the first module is one 1D convolutional layer for fusing the two-stream with RGB and optical flow video features. The sampling module further processes the fused features to generate the 2D start-end action proposal feature maps. Then, the multi-scale U-Block convolutional capsule module with 3D convolutional dynamic routing is used to process the proposal feature map. Finally, the feature maps generated from the CapsBoundNet are used to predict starting, ending, action classification, and action regression score maps, which help to capture the boundary and intersection over union features. Our work innovatively improves the dynamic routing algorithm of capsule networks and extends the use of capsule networks to the temporal action detection task for the first time in the literature. The experimental results on benchmarks THUMOS14 show that the performance of CapsBoundNet is obviously beyond the state-of-the-art methods, e.g., the mAP@tIoU = 0.3, 0.4, 0.5 on THUMOS14 are improved from 63.6% to 70.0%, 57.8% to 63.1%, 51.3% to 52.9%, respectively. We also got competitive results on the action detection dataset of ActivityNet1.3.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2021.3104226</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; capsule network ; Feature extraction ; Feature maps ; Heuristic algorithms ; Modules ; Optical flow (image analysis) ; Proposals ; Routing ; Task analysis ; Temporal action detection ; temporal action proposals ; Tensors ; Three-dimensional displays ; video features</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2022-05, Vol.32 (5), p.2962-2975</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-78a9bbe231c13f94d4909ffec7230bfe14ae9b16876e4f6109d1c9dcfaeb312c3</citedby><cites>FETCH-LOGICAL-c295t-78a9bbe231c13f94d4909ffec7230bfe14ae9b16876e4f6109d1c9dcfaeb312c3</cites><orcidid>0000-0002-0679-4601 ; 0000-0001-8141-8430 ; 0000-0002-7212-1755</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9512048$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9512048$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Yaosen</creatorcontrib><creatorcontrib>Guo, Bing</creatorcontrib><creatorcontrib>Shen, Yan</creatorcontrib><creatorcontrib>Wang, Wei</creatorcontrib><creatorcontrib>Lu, Weichen</creatorcontrib><creatorcontrib>Suo, Xinhua</creatorcontrib><title>Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inability from convolutional neural networks, which can better understand the temporal relations for temporal action detection. However, because of the extremely computationally expensive procedure, capsule network is difficult to be applied to the task of temporal action detection. To address this issue, this paper proposes a novel U-shaped capsule network framework with a k-Nearest Neighbor (k-NN) mechanism of 3D convolutional dynamic routing, which we named U-BlockConvCaps. Furthermore, we build a Capsules Boundary Network (CapsBoundNet) based on U-BlockConvCaps for dense temporal action proposal generation. Specifically, the first module is one 1D convolutional layer for fusing the two-stream with RGB and optical flow video features. The sampling module further processes the fused features to generate the 2D start-end action proposal feature maps. Then, the multi-scale U-Block convolutional capsule module with 3D convolutional dynamic routing is used to process the proposal feature map. Finally, the feature maps generated from the CapsBoundNet are used to predict starting, ending, action classification, and action regression score maps, which help to capture the boundary and intersection over union features. Our work innovatively improves the dynamic routing algorithm of capsule networks and extends the use of capsule networks to the temporal action detection task for the first time in the literature. The experimental results on benchmarks THUMOS14 show that the performance of CapsBoundNet is obviously beyond the state-of-the-art methods, e.g., the mAP@tIoU = 0.3, 0.4, 0.5 on THUMOS14 are improved from 63.6% to 70.0%, 57.8% to 63.1%, 51.3% to 52.9%, respectively. We also got competitive results on the action detection dataset of ActivityNet1.3.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>capsule network</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Heuristic algorithms</subject><subject>Modules</subject><subject>Optical flow (image analysis)</subject><subject>Proposals</subject><subject>Routing</subject><subject>Task analysis</subject><subject>Temporal action detection</subject><subject>temporal action proposals</subject><subject>Tensors</subject><subject>Three-dimensional displays</subject><subject>video features</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kFtLw0AQhYMoWKt_QF8WfE7d2UuSfaypNygKGvVFCMlmVlPTbN1NlP57U1t8msPMOcPMFwSnQCcAVF1k6dNLNmGUwYQDFYxFe8EIpExCxqjcHzSVECYM5GFw5P2CUhCJiEfBW1qsfN8gubR9WxVuTe6x-7Huk7zW3QfhM5La9ts2fVfbtmjIbN0Wy1qTRzt02ndirCMZLlfWDcOp3rjIDDv8U8fBgSkajye7Og6er6-y9DacP9zcpdN5qJmSXRgnhSpLZBw0cKNEJRRVxqCOGaelQRAFqhKiJI5QmGj4twKtKm0KLDkwzcfB-XbvytmvHn2XL2zvhnN9ziKpuJCCqsHFti7trPcOTb5y9XJ4OQeabyjmfxTzDcV8R3EInW1DNSL-B5QERkXCfwE4bm8X</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Chen, Yaosen</creator><creator>Guo, Bing</creator><creator>Shen, Yan</creator><creator>Wang, Wei</creator><creator>Lu, Weichen</creator><creator>Suo, Xinhua</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0679-4601</orcidid><orcidid>https://orcid.org/0000-0001-8141-8430</orcidid><orcidid>https://orcid.org/0000-0002-7212-1755</orcidid></search><sort><creationdate>20220501</creationdate><title>Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection</title><author>Chen, Yaosen ; Guo, Bing ; Shen, Yan ; Wang, Wei ; Lu, Weichen ; Suo, Xinhua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-78a9bbe231c13f94d4909ffec7230bfe14ae9b16876e4f6109d1c9dcfaeb312c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>capsule network</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Heuristic algorithms</topic><topic>Modules</topic><topic>Optical flow (image analysis)</topic><topic>Proposals</topic><topic>Routing</topic><topic>Task analysis</topic><topic>Temporal action detection</topic><topic>temporal action proposals</topic><topic>Tensors</topic><topic>Three-dimensional displays</topic><topic>video features</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yaosen</creatorcontrib><creatorcontrib>Guo, Bing</creatorcontrib><creatorcontrib>Shen, Yan</creatorcontrib><creatorcontrib>Wang, Wei</creatorcontrib><creatorcontrib>Lu, Weichen</creatorcontrib><creatorcontrib>Suo, Xinhua</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Yaosen</au><au>Guo, Bing</au><au>Shen, Yan</au><au>Wang, Wei</au><au>Lu, Weichen</au><au>Suo, Xinhua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>32</volume><issue>5</issue><spage>2962</spage><epage>2975</epage><pages>2962-2975</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Temporal action detection is a challenging task in video understanding, due to the complexity of the background and rich action content impacting high-quality temporal proposals generation in untrimmed videos. Capsule networks can avoid some limitations of the invariance caused by pooling and inability from convolutional neural networks, which can better understand the temporal relations for temporal action detection. However, because of the extremely computationally expensive procedure, capsule network is difficult to be applied to the task of temporal action detection. To address this issue, this paper proposes a novel U-shaped capsule network framework with a k-Nearest Neighbor (k-NN) mechanism of 3D convolutional dynamic routing, which we named U-BlockConvCaps. Furthermore, we build a Capsules Boundary Network (CapsBoundNet) based on U-BlockConvCaps for dense temporal action proposal generation. Specifically, the first module is one 1D convolutional layer for fusing the two-stream with RGB and optical flow video features. The sampling module further processes the fused features to generate the 2D start-end action proposal feature maps. Then, the multi-scale U-Block convolutional capsule module with 3D convolutional dynamic routing is used to process the proposal feature map. Finally, the feature maps generated from the CapsBoundNet are used to predict starting, ending, action classification, and action regression score maps, which help to capture the boundary and intersection over union features. Our work innovatively improves the dynamic routing algorithm of capsule networks and extends the use of capsule networks to the temporal action detection task for the first time in the literature. The experimental results on benchmarks THUMOS14 show that the performance of CapsBoundNet is obviously beyond the state-of-the-art methods, e.g., the mAP@tIoU = 0.3, 0.4, 0.5 on THUMOS14 are improved from 63.6% to 70.0%, 57.8% to 63.1%, 51.3% to 52.9%, respectively. We also got competitive results on the action detection dataset of ActivityNet1.3.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2021.3104226</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0679-4601</orcidid><orcidid>https://orcid.org/0000-0001-8141-8430</orcidid><orcidid>https://orcid.org/0000-0002-7212-1755</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2022-05, Vol.32 (5), p.2962-2975 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TCSVT_2021_3104226 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Artificial neural networks capsule network Feature extraction Feature maps Heuristic algorithms Modules Optical flow (image analysis) Proposals Routing Task analysis Temporal action detection temporal action proposals Tensors Three-dimensional displays video features |
title | Capsule Boundary Network With 3D Convolutional Dynamic Routing for Temporal Action Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T12%3A15%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Capsule%20Boundary%20Network%20With%203D%20Convolutional%20Dynamic%20Routing%20for%20Temporal%20Action%20Detection&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Chen,%20Yaosen&rft.date=2022-05-01&rft.volume=32&rft.issue=5&rft.spage=2962&rft.epage=2975&rft.pages=2962-2975&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2021.3104226&rft_dat=%3Cproquest_RIE%3E2659345409%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2659345409&rft_id=info:pmid/&rft_ieee_id=9512048&rfr_iscdi=true |