Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using sim...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2018-02, Vol.40 (2), p.352-364 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 364 |
---|---|
container_issue | 2 |
container_start_page | 352 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 40 |
creator | Jiang, Yu-Gang Wu, Zuxuan Wang, Jun Xue, Xiangyang Chang, Shih-Fu |
description | In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories. |
doi_str_mv | 10.1109/TPAMI.2017.2670560 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1870986777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7857793</ieee_id><sourcerecordid>1870986777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</originalsourceid><addsrcrecordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174508148</pqid></control><display><type>article</type><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creator><creatorcontrib>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</creatorcontrib><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2017.2670560</identifier><identifier>PMID: 28221992</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; benchmark dataset ; Benchmark testing ; Benchmarks ; class relationships ; Classification ; Correlation ; deep neural networks ; Feature extraction ; feature fusion ; Internet ; Neural networks ; regularization ; Semantics ; State of the art ; Video categorization ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</citedby><cites>FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</cites><orcidid>0000-0002-1907-8567 ; 0000-0003-1444-1205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7857793$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28221992$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</description><subject>Artificial neural networks</subject><subject>benchmark dataset</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>class relationships</subject><subject>Classification</subject><subject>Correlation</subject><subject>deep neural networks</subject><subject>Feature extraction</subject><subject>feature fusion</subject><subject>Internet</subject><subject>Neural networks</subject><subject>regularization</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Video categorization</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFv1DAQhS1ERZfCHwAJWeLSS7YeO4ntY7W0tFJbECpcLa8z2XXJxqmdqIVfX2936YHTSDPfPM28R8gHYHMApk9uv59eX845AznntWRVzV6RGWihC1EJ_ZrMGNS8UIqrQ_I2pTvGoKyYeEMOueIctOYzsj57HLrgR9-v6DnacYpIbd_QRWdToj-ws6MPfVr7IVHf01--wUAXdsRViP7v85A--HGd0dXU2dzDhn5BHOgNTtF2uYwPIf5O78hBa7uE7_f1iPw8P7tdXBRX375eLk6vCleW5VhIUKDaRjgHpWu5FQ3DWmsFTgulQLByuayatpVaVHXpGFpeaVdZ59RSYavEETne6Q4x3E-YRrPxyWHX2R7DlAwoybSqpZQZ_fwfehem2OfrDAeZrVJQbgX5jnIxpBSxNUP0Gxv_GGBmm4N5zsFsczD7HPLSp730tNxg87Lyz_gMfNwBHhFfxlJVMn8mngBvKYzv</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Jiang, Yu-Gang</creator><creator>Wu, Zuxuan</creator><creator>Wang, Jun</creator><creator>Xue, Xiangyang</creator><creator>Chang, Shih-Fu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid></search><sort><creationdate>20180201</creationdate><title>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</title><author>Jiang, Yu-Gang ; Wu, Zuxuan ; Wang, Jun ; Xue, Xiangyang ; Chang, Shih-Fu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-71818fd3cc14cf2a3d0e69981c93881304bb5dff793564c0ea259c5acc8b8ef83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>benchmark dataset</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>class relationships</topic><topic>Classification</topic><topic>Correlation</topic><topic>deep neural networks</topic><topic>Feature extraction</topic><topic>feature fusion</topic><topic>Internet</topic><topic>Neural networks</topic><topic>regularization</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Video categorization</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Yu-Gang</creatorcontrib><creatorcontrib>Wu, Zuxuan</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Xue, Xiangyang</creatorcontrib><creatorcontrib>Chang, Shih-Fu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Yu-Gang</au><au>Wu, Zuxuan</au><au>Wang, Jun</au><au>Xue, Xiangyang</au><au>Chang, Shih-Fu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2018-02-01</date><risdate>2018</risdate><volume>40</volume><issue>2</issue><spage>352</spage><epage>364</epage><pages>352-364</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28221992</pmid><doi>10.1109/TPAMI.2017.2670560</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1907-8567</orcidid><orcidid>https://orcid.org/0000-0003-1444-1205</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2018-02, Vol.40 (2), p.352-364 |
issn | 0162-8828 1939-3539 2160-9292 |
language | eng |
recordid | cdi_proquest_miscellaneous_1870986777 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial neural networks benchmark dataset Benchmark testing Benchmarks class relationships Classification Correlation deep neural networks Feature extraction feature fusion Internet Neural networks regularization Semantics State of the art Video categorization Visualization |
title | Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T23%3A45%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Feature%20and%20Class%20Relationships%20in%20Video%20Categorization%20with%20Regularized%20Deep%20Neural%20Networks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Jiang,%20Yu-Gang&rft.date=2018-02-01&rft.volume=40&rft.issue=2&rft.spage=352&rft.epage=364&rft.pages=352-364&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2017.2670560&rft_dat=%3Cproquest_RIE%3E1870986777%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174508148&rft_id=info:pmid/28221992&rft_ieee_id=7857793&rfr_iscdi=true |