Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using sim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2018-02, Vol.40 (2), p.352-364
Hauptverfasser:	Jiang, Yu-Gang, Wu, Zuxuan, Wang, Jun, Xue, Xiangyang, Chang, Shih-Fu
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks benchmark dataset Benchmark testing Benchmarks class relationships Classification Correlation deep neural networks Feature extraction feature fusion Internet Neural networks regularization Semantics State of the art Video categorization Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	364
container_issue	2
container_start_page	352
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	40
creator	Jiang, Yu-Gang Wu, Zuxuan Wang, Jun Xue, Xiangyang Chang, Shih-Fu
description	In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.
doi_str_mv	10.1109/TPAMI.2017.2670560
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1870986777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7857793</ieee_id><sourcerecordid>1870986777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</originalsourceid><addsrcrecordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174508148</pqid></control><display><type>article</type><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creator><creatorcontrib>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creatorcontrib><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2670560</identifier><identifier>PMID: 28221992</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; benchmark dataset ; Benchmark testing ; Benchmarks ; class relationships ; Classification ; Correlation ; deep neural networks ; Feature extraction ; feature fusion ; Internet ; Neural networks ; regularization ; Semantics ; State of the art ; Video categorization ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</citedby><cites>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</cites><orcidid>0000-0002-1907-8567 ; 0000-0003-1444-1205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28221992$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><subject>Artificial neural networks</subject><subject>benchmark dataset</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>class relationships</subject><subject>Classification</subject><subject>Correlation</subject><subject>deep neural networks</subject><subject>Feature extraction</subject><subject>feature fusion</subject><subject>Internet</subject><subject>Neural networks</subject><subject>regularization</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Video categorization</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Jiang, Yu-Gang</creator><creator>Wu, Zuxuan</creator><creator>Wang, Jun</creator><creator>Xue, Xiangyang</creator><creator>Chang, Shih-Fu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid></search><sort><creationdate>20180201</creationdate><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><author>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>benchmark dataset</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>class relationships</topic><topic>Classification</topic><topic>Correlation</topic><topic>deep neural networks</topic><topic>Feature extraction</topic><topic>feature fusion</topic><topic>Internet</topic><topic>Neural networks</topic><topic>regularization</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Video categorization</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Yu-Gang</au><au>Wu, Zuxuan</au><au>Wang, Jun</au><au>Xue, Xiangyang</au><au>Chang, Shih-Fu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-02-01</date><risdate>2018</risdate><volume>40</volume><issue>2</issue><spage>352</spage><epage>364</epage><pages>352-364</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28221992</pmid><doi>10.1109/TPAMI.2017.2670560</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_proquest_miscellaneous_1870986777
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks benchmark dataset Benchmark testing Benchmarks class relationships Classification Correlation deep neural networks Feature extraction feature fusion Internet Neural networks regularization Semantics State of the art Video categorization Visualization
title	Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T23%3A45%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Feature%20and%20Class%20Relationships%20in%20Video%20Categorization%20with%20Regularized%20Deep%20Neural%20Networks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Jiang,%20Yu-Gang&rft.date=2018-02-01&rft.volume=40&rft.issue=2&rft.spage=352&rft.epage=364&rft.pages=352-364&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2670560&rft_dat=%3Cproquest_RIE%3E1870986777%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174508148&rft_id=info:pmid/28221992&rft_ieee_id=7857793&rfr_iscdi=true