Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling

The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024, Vol.26, p.7438-7450
Hauptverfasser:	Duan, Jingru, Hao, Yanbin, Zhu, Bin, Cheng, Lechao, Zhou, Pengyuan, Wang, Xiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Binary codes Codes Computational efficiency Computational modeling Computing costs Context modeling data structure Data structures deep neural network Feature extraction Hash functions large-scale retrieval Lightweight Neural networks Tensors Transformers Video hashing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7450
container_issue
container_start_page	7438
container_title	IEEE transactions on multimedia
container_volume	26
creator	Duan, Jingru Hao, Yanbin Zhu, Bin Cheng, Lechao Zhou, Pengyuan Wang, Xiang
description	The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This article proposes an Efficient Unsupervised Video Hashing ( EUVH ) method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.
doi_str_mv	10.1109/TMM.2024.3368924
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10443557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10443557</ieee_id><sourcerecordid>3044650155</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-4bb9eb666117a08b77bde9af0a938a477c81d85311eca16a56c8e3cb8a5e39643</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EEqWwMzBEYk45x1_JiKpCkVox0AKb5TgX6qokxXYQ_HsStQPTne597k56CLmmMKEUirvVcjnJIOMTxmReZPyEjGjBaQqg1GnfiwzSIqNwTi5C2AJQLkCNyPusrp112MRk3YRuj_7bBaySV1dhm8xN2LjmI3lzcZNM2ybiT-zMLlm2Fe6GwDRV8hJ9Z2Pn-_mA-HY3RJfkrDa7gFfHOibrh9lqOk8Xz49P0_tFajMuYsrLssBSSkmpMpCXSpUVFqYGU7DccKVsTqtcMErRGiqNkDZHZsvcCGSF5GxMbg9397796jBEvW073_QvNQPOpQAqRE_BgbK-DcFjrffefRr_qynowZ_u_enBnz7661duDisOEf_hnDMhFPsDVn5swQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3044650155</pqid></control><display><type>article</type><title>Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling</title><source>IEEE Electronic Library (IEL)</source><creator>Duan, Jingru ; Hao, Yanbin ; Zhu, Bin ; Cheng, Lechao ; Zhou, Pengyuan ; Wang, Xiang</creator><creatorcontrib>Duan, Jingru ; Hao, Yanbin ; Zhu, Bin ; Cheng, Lechao ; Zhou, Pengyuan ; Wang, Xiang</creatorcontrib><description>The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This article proposes an Efficient Unsupervised Video Hashing ( EUVH ) method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2024.3368924</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Binary codes ; Codes ; Computational efficiency ; Computational modeling ; Computing costs ; Context modeling ; data structure ; Data structures ; deep neural network ; Feature extraction ; Hash functions ; large-scale retrieval ; Lightweight ; Neural networks ; Tensors ; Transformers ; Video hashing</subject><ispartof>IEEE transactions on multimedia, 2024, Vol.26, p.7438-7450</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-4bb9eb666117a08b77bde9af0a938a477c81d85311eca16a56c8e3cb8a5e39643</cites><orcidid>0000-0002-9213-2611 ; 0000-0001-7142-9857 ; 0000-0002-0695-1566 ; 0000-0002-6148-6329 ; 0000-0002-7546-9052 ; 0000-0002-7909-4059</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10443557$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4021,27921,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10443557$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Duan, Jingru</creatorcontrib><creatorcontrib>Hao, Yanbin</creatorcontrib><creatorcontrib>Zhu, Bin</creatorcontrib><creatorcontrib>Cheng, Lechao</creatorcontrib><creatorcontrib>Zhou, Pengyuan</creatorcontrib><creatorcontrib>Wang, Xiang</creatorcontrib><title>Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This article proposes an Efficient Unsupervised Video Hashing ( EUVH ) method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.</description><subject>Binary codes</subject><subject>Codes</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Computing costs</subject><subject>Context modeling</subject><subject>data structure</subject><subject>Data structures</subject><subject>deep neural network</subject><subject>Feature extraction</subject><subject>Hash functions</subject><subject>large-scale retrieval</subject><subject>Lightweight</subject><subject>Neural networks</subject><subject>Tensors</subject><subject>Transformers</subject><subject>Video hashing</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAQhi0EEqWwMzBEYk45x1_JiKpCkVox0AKb5TgX6qokxXYQ_HsStQPTne597k56CLmmMKEUirvVcjnJIOMTxmReZPyEjGjBaQqg1GnfiwzSIqNwTi5C2AJQLkCNyPusrp112MRk3YRuj_7bBaySV1dhm8xN2LjmI3lzcZNM2ybiT-zMLlm2Fe6GwDRV8hJ9Z2Pn-_mA-HY3RJfkrDa7gFfHOibrh9lqOk8Xz49P0_tFajMuYsrLssBSSkmpMpCXSpUVFqYGU7DccKVsTqtcMErRGiqNkDZHZsvcCGSF5GxMbg9397796jBEvW073_QvNQPOpQAqRE_BgbK-DcFjrffefRr_qynowZ_u_enBnz7661duDisOEf_hnDMhFPsDVn5swQ</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Duan, Jingru</creator><creator>Hao, Yanbin</creator><creator>Zhu, Bin</creator><creator>Cheng, Lechao</creator><creator>Zhou, Pengyuan</creator><creator>Wang, Xiang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9213-2611</orcidid><orcidid>https://orcid.org/0000-0001-7142-9857</orcidid><orcidid>https://orcid.org/0000-0002-0695-1566</orcidid><orcidid>https://orcid.org/0000-0002-6148-6329</orcidid><orcidid>https://orcid.org/0000-0002-7546-9052</orcidid><orcidid>https://orcid.org/0000-0002-7909-4059</orcidid></search><sort><creationdate>2024</creationdate><title>Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling</title><author>Duan, Jingru ; Hao, Yanbin ; Zhu, Bin ; Cheng, Lechao ; Zhou, Pengyuan ; Wang, Xiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-4bb9eb666117a08b77bde9af0a938a477c81d85311eca16a56c8e3cb8a5e39643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Binary codes</topic><topic>Codes</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Computing costs</topic><topic>Context modeling</topic><topic>data structure</topic><topic>Data structures</topic><topic>deep neural network</topic><topic>Feature extraction</topic><topic>Hash functions</topic><topic>large-scale retrieval</topic><topic>Lightweight</topic><topic>Neural networks</topic><topic>Tensors</topic><topic>Transformers</topic><topic>Video hashing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Duan, Jingru</creatorcontrib><creatorcontrib>Hao, Yanbin</creatorcontrib><creatorcontrib>Zhu, Bin</creatorcontrib><creatorcontrib>Cheng, Lechao</creatorcontrib><creatorcontrib>Zhou, Pengyuan</creatorcontrib><creatorcontrib>Wang, Xiang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Duan, Jingru</au><au>Hao, Yanbin</au><au>Zhu, Bin</au><au>Cheng, Lechao</au><au>Zhou, Pengyuan</au><au>Wang, Xiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024</date><risdate>2024</risdate><volume>26</volume><spage>7438</spage><epage>7450</epage><pages>7438-7450</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This article proposes an Efficient Unsupervised Video Hashing ( EUVH ) method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2024.3368924</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-9213-2611</orcidid><orcidid>https://orcid.org/0000-0001-7142-9857</orcidid><orcidid>https://orcid.org/0000-0002-0695-1566</orcidid><orcidid>https://orcid.org/0000-0002-6148-6329</orcidid><orcidid>https://orcid.org/0000-0002-7546-9052</orcidid><orcidid>https://orcid.org/0000-0002-7909-4059</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2024, Vol.26, p.7438-7450
issn	1520-9210 1941-0077
language	eng
recordid	cdi_ieee_primary_10443557
source	IEEE Electronic Library (IEL)
subjects	Binary codes Codes Computational efficiency Computational modeling Computing costs Context modeling data structure Data structures deep neural network Feature extraction Hash functions large-scale retrieval Lightweight Neural networks Tensors Transformers Video hashing
title	Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A15%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Unsupervised%20Video%20Hashing%20With%20Contextual%20Modeling%20and%20Structural%20Controlling&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Duan,%20Jingru&rft.date=2024&rft.volume=26&rft.spage=7438&rft.epage=7450&rft.pages=7438-7450&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2024.3368924&rft_dat=%3Cproquest_RIE%3E3044650155%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3044650155&rft_id=info:pmid/&rft_ieee_id=10443557&rfr_iscdi=true