Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using sim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2018-02, Vol.40 (2), p.352-364
Hauptverfasser: Jiang, Yu-Gang, Wu, Zuxuan, Wang, Jun, Xue, Xiangyang, Chang, Shih-Fu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 364
container_issue 2
container_start_page 352
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 40
creator Jiang, Yu-Gang
Wu, Zuxuan
Wang, Jun
Xue, Xiangyang
Chang, Shih-Fu
description In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.
doi_str_mv 10.1109/TPAMI.2017.2670560
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1870986777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7857793</ieee_id><sourcerecordid>1870986777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</originalsourceid><addsrcrecordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174508148</pqid></control><display><type>article</type><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creator><creatorcontrib>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creatorcontrib><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2670560</identifier><identifier>PMID: 28221992</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; benchmark dataset ; Benchmark testing ; Benchmarks ; class relationships ; Classification ; Correlation ; deep neural networks ; Feature extraction ; feature fusion ; Internet ; Neural networks ; regularization ; Semantics ; State of the art ; Video categorization ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</citedby><cites>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</cites><orcidid>0000-0002-1907-8567 ; 0000-0003-1444-1205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28221992$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><subject>Artificial neural networks</subject><subject>benchmark dataset</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>class relationships</subject><subject>Classification</subject><subject>Correlation</subject><subject>deep neural networks</subject><subject>Feature extraction</subject><subject>feature fusion</subject><subject>Internet</subject><subject>Neural networks</subject><subject>regularization</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Video categorization</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Jiang, Yu-Gang</creator><creator>Wu, Zuxuan</creator><creator>Wang, Jun</creator><creator>Xue, Xiangyang</creator><creator>Chang, Shih-Fu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid></search><sort><creationdate>20180201</creationdate><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><author>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>benchmark dataset</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>class relationships</topic><topic>Classification</topic><topic>Correlation</topic><topic>deep neural networks</topic><topic>Feature extraction</topic><topic>feature fusion</topic><topic>Internet</topic><topic>Neural networks</topic><topic>regularization</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Video categorization</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Yu-Gang</au><au>Wu, Zuxuan</au><au>Wang, Jun</au><au>Xue, Xiangyang</au><au>Chang, Shih-Fu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-02-01</date><risdate>2018</risdate><volume>40</volume><issue>2</issue><spage>352</spage><epage>364</epage><pages>352-364</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28221992</pmid><doi>10.1109/TPAMI.2017.2670560</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_proquest_miscellaneous_1870986777
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
benchmark dataset
Benchmark testing
Benchmarks
class relationships
Classification
Correlation
deep neural networks
Feature extraction
feature fusion
Internet
Neural networks
regularization
Semantics
State of the art
Video categorization
Visualization
title Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T23%3A45%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Feature%20and%20Class%20Relationships%20in%20Video%20Categorization%20with%20Regularized%20Deep%20Neural%20Networks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Jiang,%20Yu-Gang&rft.date=2018-02-01&rft.volume=40&rft.issue=2&rft.spage=352&rft.epage=364&rft.pages=352-364&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2670560&rft_dat=%3Cproquest_RIE%3E1870986777%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174508148&rft_id=info:pmid/28221992&rft_ieee_id=7857793&rfr_iscdi=true