HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval

As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still cha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Xi, Zhou, Siyu, Feng, Jiashi, Lai, Hanjiang, Li, Bo, Pan, Yan, Yin, Jian, Yan, Shuicheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Xi
Zhou, Siyu
Feng, Jiashi
Lai, Hanjiang
Li, Bo
Pan, Yan
Yin, Jian
Yan, Shuicheng
description As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.
doi_str_mv 10.48550/arxiv.1711.09347
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1711_09347</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1711_09347</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-bd42d3678b9d04e3865667e4426208e22180c1dbfc5487554eece16f3b957b5f3</originalsourceid><addsrcrecordid>eNotz81KxDAUBeBsXMjoA7gyL9CaNL91V6rOOIwKMvty09xooLZDUqq-vc7o6sDhcOAj5IqzUlql2A2kr7iU3HBeslpIc062G8jv6-b5tplnHOc4jQV8QkJ6h3igjV8wZUgRBnocxvGNhinRNk0506fJ__avOKeICwwX5CzAkPHyP1dk_3C_bzfF7mX92Da7ArQxhfOy8kIb62rPJAqrldYGpax0xSxWFbes596FXklrlJKIPXIdhKuVcSqIFbn-uz1hukOKH5C-uyOqO6HED7S8Rl4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><source>arXiv.org</source><creator>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</creator><creatorcontrib>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</creatorcontrib><description>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</description><identifier>DOI: 10.48550/arxiv.1711.09347</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2017-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1711.09347$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1711.09347$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Xi</creatorcontrib><creatorcontrib>Zhou, Siyu</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Lai, Hanjiang</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Pan, Yan</creatorcontrib><creatorcontrib>Yin, Jian</creatorcontrib><creatorcontrib>Yan, Shuicheng</creatorcontrib><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><description>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KxDAUBeBsXMjoA7gyL9CaNL91V6rOOIwKMvty09xooLZDUqq-vc7o6sDhcOAj5IqzUlql2A2kr7iU3HBeslpIc062G8jv6-b5tplnHOc4jQV8QkJ6h3igjV8wZUgRBnocxvGNhinRNk0506fJ__avOKeICwwX5CzAkPHyP1dk_3C_bzfF7mX92Da7ArQxhfOy8kIb62rPJAqrldYGpax0xSxWFbes596FXklrlJKIPXIdhKuVcSqIFbn-uz1hukOKH5C-uyOqO6HED7S8Rl4</recordid><startdate>20171126</startdate><enddate>20171126</enddate><creator>Zhang, Xi</creator><creator>Zhou, Siyu</creator><creator>Feng, Jiashi</creator><creator>Lai, Hanjiang</creator><creator>Li, Bo</creator><creator>Pan, Yan</creator><creator>Yin, Jian</creator><creator>Yan, Shuicheng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20171126</creationdate><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><author>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-bd42d3678b9d04e3865667e4426208e22180c1dbfc5487554eece16f3b957b5f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Xi</creatorcontrib><creatorcontrib>Zhou, Siyu</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Lai, Hanjiang</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Pan, Yan</creatorcontrib><creatorcontrib>Yin, Jian</creatorcontrib><creatorcontrib>Yan, Shuicheng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Xi</au><au>Zhou, Siyu</au><au>Feng, Jiashi</au><au>Lai, Hanjiang</au><au>Li, Bo</au><au>Pan, Yan</au><au>Yin, Jian</au><au>Yan, Shuicheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</atitle><date>2017-11-26</date><risdate>2017</risdate><abstract>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</abstract><doi>10.48550/arxiv.1711.09347</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1711.09347
ispartof
issn
language eng
recordid cdi_arxiv_primary_1711_09347
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T21%3A57%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HashGAN:Attention-aware%20Deep%20Adversarial%20Hashing%20for%20Cross%20Modal%20Retrieval&rft.au=Zhang,%20Xi&rft.date=2017-11-26&rft_id=info:doi/10.48550/arxiv.1711.09347&rft_dat=%3Carxiv_GOX%3E1711_09347%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true