Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling
Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adve...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on image processing 2023, Vol.32, p.5245-5256 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5256 |
---|---|
container_issue | |
container_start_page | 5245 |
container_title | IEEE transactions on image processing |
container_volume | 32 |
creator | Mac, Khoi-Nguyen C. Do, Minh N. Vo, Minh P. |
description | Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 (split-1) datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal . |
doi_str_mv | 10.1109/TIP.2023.3310661 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_miscellaneous_2860405913</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10236596</ieee_id><sourcerecordid>2866492921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c320t-15d81de4970d68d691e903709b197332b42af4e77e57e145c573bdfcbb4d3d3e3</originalsourceid><addsrcrecordid>eNpdkM1LI0EQxZtF8SPrfQ8eBrx4mVjVn-ljkLgGBGWjC3saeqZrpGW-nJ4I_vd2Nh7EU70qfq94PMZ-IcwRwV49rh_mHLiYC4GgNf5gJ2gl5gCSHyQNyuQGpT1mpzG-AKBUqI_YsTBaoQI4Yf9WdR2qQN2U3W5b12V_Qwx9l627OISRfLaspt3-h6r-uQv_9VMM3XO29G6Ywhtlm8Gl80Tt0I-uyTauHZoE_GSHtWsinX3OGXu6WT1e3-Z397_X18u7vBIcphyVX6AnaQ14vfDaIlkQBmyJ1gjBS8ldLckYUoZS_koZUfq6KkvphRckZuxy_3cY-9ctxaloQ6yoaVxH_TYWfKFBgrIoEnrxDX3pt2OX0u0oLS23HBMFe6oa-xhHqothDK0b3wuEYld7kWovdrUXn7Uny_neEojoC86FVlaLD-cue90</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2866492921</pqid></control><display><type>article</type><title>Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling</title><source>IEEE Electronic Library (IEL)</source><creator>Mac, Khoi-Nguyen C. ; Do, Minh N. ; Vo, Minh P.</creator><creatorcontrib>Mac, Khoi-Nguyen C. ; Do, Minh N. ; Vo, Minh P.</creatorcontrib><description>Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 (split-1) datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal .</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3310661</identifier><identifier>PMID: 37651500</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; action recognition ; Activity recognition ; Adaptation models ; Adaptive sampling ; Biological system modeling ; Computational modeling ; Context ; Redundancy ; Source code ; spatiotemporal ; Spatiotemporal phenomena ; Task analysis ; Videos ; Vision ; Visual perception ; Visualization ; Wearable technology</subject><ispartof>IEEE transactions on image processing, 2023, Vol.32, p.5245-5256</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c320t-15d81de4970d68d691e903709b197332b42af4e77e57e145c573bdfcbb4d3d3e3</cites><orcidid>0000-0003-4623-4016 ; 0000-0001-5132-4986 ; 0000-0002-4608-7696</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10236596$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids></links><search><creatorcontrib>Mac, Khoi-Nguyen C.</creatorcontrib><creatorcontrib>Do, Minh N.</creatorcontrib><creatorcontrib>Vo, Minh P.</creatorcontrib><title>Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 (split-1) datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal .</description><subject>Accuracy</subject><subject>action recognition</subject><subject>Activity recognition</subject><subject>Adaptation models</subject><subject>Adaptive sampling</subject><subject>Biological system modeling</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Redundancy</subject><subject>Source code</subject><subject>spatiotemporal</subject><subject>Spatiotemporal phenomena</subject><subject>Task analysis</subject><subject>Videos</subject><subject>Vision</subject><subject>Visual perception</subject><subject>Visualization</subject><subject>Wearable technology</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpdkM1LI0EQxZtF8SPrfQ8eBrx4mVjVn-ljkLgGBGWjC3saeqZrpGW-nJ4I_vd2Nh7EU70qfq94PMZ-IcwRwV49rh_mHLiYC4GgNf5gJ2gl5gCSHyQNyuQGpT1mpzG-AKBUqI_YsTBaoQI4Yf9WdR2qQN2U3W5b12V_Qwx9l627OISRfLaspt3-h6r-uQv_9VMM3XO29G6Ywhtlm8Gl80Tt0I-uyTauHZoE_GSHtWsinX3OGXu6WT1e3-Z397_X18u7vBIcphyVX6AnaQ14vfDaIlkQBmyJ1gjBS8ldLckYUoZS_koZUfq6KkvphRckZuxy_3cY-9ctxaloQ6yoaVxH_TYWfKFBgrIoEnrxDX3pt2OX0u0oLS23HBMFe6oa-xhHqothDK0b3wuEYld7kWovdrUXn7Uny_neEojoC86FVlaLD-cue90</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Mac, Khoi-Nguyen C.</creator><creator>Do, Minh N.</creator><creator>Vo, Minh P.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4623-4016</orcidid><orcidid>https://orcid.org/0000-0001-5132-4986</orcidid><orcidid>https://orcid.org/0000-0002-4608-7696</orcidid></search><sort><creationdate>2023</creationdate><title>Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling</title><author>Mac, Khoi-Nguyen C. ; Do, Minh N. ; Vo, Minh P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c320t-15d81de4970d68d691e903709b197332b42af4e77e57e145c573bdfcbb4d3d3e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>action recognition</topic><topic>Activity recognition</topic><topic>Adaptation models</topic><topic>Adaptive sampling</topic><topic>Biological system modeling</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Redundancy</topic><topic>Source code</topic><topic>spatiotemporal</topic><topic>Spatiotemporal phenomena</topic><topic>Task analysis</topic><topic>Videos</topic><topic>Vision</topic><topic>Visual perception</topic><topic>Visualization</topic><topic>Wearable technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mac, Khoi-Nguyen C.</creatorcontrib><creatorcontrib>Do, Minh N.</creatorcontrib><creatorcontrib>Vo, Minh P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mac, Khoi-Nguyen C.</au><au>Do, Minh N.</au><au>Vo, Minh P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2023</date><risdate>2023</risdate><volume>32</volume><spage>5245</spage><epage>5256</epage><pages>5245-5256</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 (split-1) datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal .</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37651500</pmid><doi>10.1109/TIP.2023.3310661</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-4623-4016</orcidid><orcidid>https://orcid.org/0000-0001-5132-4986</orcidid><orcidid>https://orcid.org/0000-0002-4608-7696</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1057-7149 |
ispartof | IEEE transactions on image processing, 2023, Vol.32, p.5245-5256 |
issn | 1057-7149 1941-0042 |
language | eng |
recordid | cdi_proquest_miscellaneous_2860405913 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy action recognition Activity recognition Adaptation models Adaptive sampling Biological system modeling Computational modeling Context Redundancy Source code spatiotemporal Spatiotemporal phenomena Task analysis Videos Vision Visual perception Visualization Wearable technology |
title | Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T16%3A14%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Human%20Vision%20Inspired%20Action%20Recognition%20Using%20Adaptive%20Spatiotemporal%20Sampling&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Mac,%20Khoi-Nguyen%20C.&rft.date=2023&rft.volume=32&rft.spage=5245&rft.epage=5256&rft.pages=5245-5256&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3310661&rft_dat=%3Cproquest_ieee_%3E2866492921%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2866492921&rft_id=info:pmid/37651500&rft_ieee_id=10236596&rfr_iscdi=true |