A Coarse-to-Fine Framework for Resource Efficient Video Recognition

Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2021-11, Vol.129 (11), p.2965-2977
Hauptverfasser:	Wu, Zuxuan, Li, Hengduo, Zheng, Yingbin, Xiong, Caiming, Jiang, Yu-Gang, Davis, Larry S
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Artificial neural networks Classification Computer Imaging Computer Science Experiments Image Processing and Computer Vision Neural networks Pattern Recognition Pattern Recognition and Graphics Recognition Special Issue on Deep Learning for Video Analysis and Compression User generated content Video data Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2977
container_issue	11
container_start_page	2965
container_title	International journal of computer vision
container_volume	129
creator	Wu, Zuxuan Li, Hengduo Zheng, Yingbin Xiong, Caiming Jiang, Yu-Gang Davis, Larry S
description	Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.
doi_str_mv	10.1007/s11263-021-01508-1
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2582666505</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A679328216</galeid><sourcerecordid>A679328216</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-4ad0e2d75fc5db3f51dcc447ec630b7737c96a6c8bf57328f35404b50aeecdf13</originalsourceid><addsrcrecordid>eNp9kU9LAzEQxYMoWKtfwNOCJw_RSbLJ7h5LaVUQhPrnGtLspETtpiZb1G9v6griReYwMPN7Mw8eIacMLhhAdZkY40pQ4IwCk1BTtkdGTFaCshLkPhlBw4FK1bBDcpTSMwDwmosRmU6KaTAxIe0DnfsOi3k0a3wP8aVwIRYLTGEbLRYz57z12PXFk28x5IUNq873PnTH5MCZ14QnP31MHuezh-k1vb27uplObqkVDe9paVpA3lbSWdkuhZOstbYsK7RKwLKqRGUbZZStly4b57UTsoRyKcEg2tYxMSZnw91NDG9bTL1-zt66_FJzWXOllASZqYuBWplX1L5zoY_G5mpx7W3o0Pk8n6iqyT84U1lw_keQmR4_-pXZpqRv7hd_WT6wNoaUIjq9iX5t4qdmoHdJ6CEJnZPQ30nonW8xiFKGuxXGX9__qL4AmfeJgw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2582666505</pqid></control><display><type>article</type><title>A Coarse-to-Fine Framework for Resource Efficient Video Recognition</title><source>SpringerNature Journals</source><creator>Wu, Zuxuan ; Li, Hengduo ; Zheng, Yingbin ; Xiong, Caiming ; Jiang, Yu-Gang ; Davis, Larry S</creator><creatorcontrib>Wu, Zuxuan ; Li, Hengduo ; Zheng, Yingbin ; Xiong, Caiming ; Jiang, Yu-Gang ; Davis, Larry S</creatorcontrib><description>Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-021-01508-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Classification ; Computer Imaging ; Computer Science ; Experiments ; Image Processing and Computer Vision ; Neural networks ; Pattern Recognition ; Pattern Recognition and Graphics ; Recognition ; Special Issue on Deep Learning for Video Analysis and Compression ; User generated content ; Video data ; Vision</subject><ispartof>International journal of computer vision, 2021-11, Vol.129 (11), p.2965-2977</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>COPYRIGHT 2021 Springer</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-4ad0e2d75fc5db3f51dcc447ec630b7737c96a6c8bf57328f35404b50aeecdf13</citedby><cites>FETCH-LOGICAL-c392t-4ad0e2d75fc5db3f51dcc447ec630b7737c96a6c8bf57328f35404b50aeecdf13</cites><orcidid>0000-0002-8689-5807</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-021-01508-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-021-01508-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Li, Hengduo</creatorcontrib><creatorcontrib>Zheng, Yingbin</creatorcontrib><creatorcontrib>Xiong, Caiming</creatorcontrib><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Davis, Larry S</creatorcontrib><title>A Coarse-to-Fine Framework for Resource Efficient Video Recognition</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Experiments</subject><subject>Image Processing and Computer Vision</subject><subject>Neural networks</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Recognition</subject><subject>Special Issue on Deep Learning for Video Analysis and Compression</subject><subject>User generated content</subject><subject>Video data</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU9LAzEQxYMoWKtfwNOCJw_RSbLJ7h5LaVUQhPrnGtLspETtpiZb1G9v6griReYwMPN7Mw8eIacMLhhAdZkY40pQ4IwCk1BTtkdGTFaCshLkPhlBw4FK1bBDcpTSMwDwmosRmU6KaTAxIe0DnfsOi3k0a3wP8aVwIRYLTGEbLRYz57z12PXFk28x5IUNq873PnTH5MCZ14QnP31MHuezh-k1vb27uplObqkVDe9paVpA3lbSWdkuhZOstbYsK7RKwLKqRGUbZZStly4b57UTsoRyKcEg2tYxMSZnw91NDG9bTL1-zt66_FJzWXOllASZqYuBWplX1L5zoY_G5mpx7W3o0Pk8n6iqyT84U1lw_keQmR4_-pXZpqRv7hd_WT6wNoaUIjq9iX5t4qdmoHdJ6CEJnZPQ30nonW8xiFKGuxXGX9__qL4AmfeJgw</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Wu, Zuxuan</creator><creator>Li, Hengduo</creator><creator>Zheng, Yingbin</creator><creator>Xiong, Caiming</creator><creator>Jiang, Yu-Gang</creator><creator>Davis, Larry S</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-8689-5807</orcidid></search><sort><creationdate>20211101</creationdate><title>A Coarse-to-Fine Framework for Resource Efficient Video Recognition</title><author>Wu, Zuxuan ; Li, Hengduo ; Zheng, Yingbin ; Xiong, Caiming ; Jiang, Yu-Gang ; Davis, Larry S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-4ad0e2d75fc5db3f51dcc447ec630b7737c96a6c8bf57328f35404b50aeecdf13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Experiments</topic><topic>Image Processing and Computer Vision</topic><topic>Neural networks</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Recognition</topic><topic>Special Issue on Deep Learning for Video Analysis and Compression</topic><topic>User generated content</topic><topic>Video data</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Li, Hengduo</creatorcontrib><creatorcontrib>Zheng, Yingbin</creatorcontrib><creatorcontrib>Xiong, Caiming</creatorcontrib><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Davis, Larry S</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Zuxuan</au><au>Li, Hengduo</au><au>Zheng, Yingbin</au><au>Xiong, Caiming</au><au>Jiang, Yu-Gang</au><au>Davis, Larry S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Coarse-to-Fine Framework for Resource Efficient Video Recognition</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>129</volume><issue>11</issue><spage>2965</spage><epage>2977</epage><pages>2965-2977</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-021-01508-1</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-8689-5807</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-5691
ispartof	International journal of computer vision, 2021-11, Vol.129 (11), p.2965-2977
issn	0920-5691 1573-1405
language	eng
recordid	cdi_proquest_journals_2582666505
source	SpringerNature Journals
subjects	Artificial Intelligence Artificial neural networks Classification Computer Imaging Computer Science Experiments Image Processing and Computer Vision Neural networks Pattern Recognition Pattern Recognition and Graphics Recognition Special Issue on Deep Learning for Video Analysis and Compression User generated content Video data Vision
title	A Coarse-to-Fine Framework for Resource Efficient Video Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A26%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Coarse-to-Fine%20Framework%20for%20Resource%20Efficient%20Video%20Recognition&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Wu,%20Zuxuan&rft.date=2021-11-01&rft.volume=129&rft.issue=11&rft.spage=2965&rft.epage=2977&rft.pages=2965-2977&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-021-01508-1&rft_dat=%3Cgale_proqu%3EA679328216%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2582666505&rft_id=info:pmid/&rft_galeid=A679328216&rfr_iscdi=true