HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval

As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still cha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Xi, Zhou, Siyu, Feng, Jiashi, Lai, Hanjiang, Li, Bo, Pan, Yan, Yin, Jian, Yan, Shuicheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Xi Zhou, Siyu Feng, Jiashi Lai, Hanjiang Li, Bo Pan, Yan Yin, Jian Yan, Shuicheng
description	As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.
doi_str_mv	10.48550/arxiv.1711.09347
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1711_09347</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1711_09347</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-bd42d3678b9d04e3865667e4426208e22180c1dbfc5487554eece16f3b957b5f3</originalsourceid><addsrcrecordid>eNotz81KxDAUBeBsXMjoA7gyL9CaNL91V6rOOIwKMvty09xooLZDUqq-vc7o6sDhcOAj5IqzUlql2A2kr7iU3HBeslpIc062G8jv6-b5tplnHOc4jQV8QkJ6h3igjV8wZUgRBnocxvGNhinRNk0506fJ__avOKeICwwX5CzAkPHyP1dk_3C_bzfF7mX92Da7ArQxhfOy8kIb62rPJAqrldYGpax0xSxWFbes596FXklrlJKIPXIdhKuVcSqIFbn-uz1hukOKH5C-uyOqO6HED7S8Rl4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><source>arXiv.org</source><creator>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</creator><creatorcontrib>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</creatorcontrib><description>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</description><identifier>DOI: 10.48550/arxiv.1711.09347</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2017-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1711.09347$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1711.09347$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Xi</creatorcontrib><creatorcontrib>Zhou, Siyu</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Lai, Hanjiang</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Pan, Yan</creatorcontrib><creatorcontrib>Yin, Jian</creatorcontrib><creatorcontrib>Yan, Shuicheng</creatorcontrib><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><description>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KxDAUBeBsXMjoA7gyL9CaNL91V6rOOIwKMvty09xooLZDUqq-vc7o6sDhcOAj5IqzUlql2A2kr7iU3HBeslpIc062G8jv6-b5tplnHOc4jQV8QkJ6h3igjV8wZUgRBnocxvGNhinRNk0506fJ__avOKeICwwX5CzAkPHyP1dk_3C_bzfF7mX92Da7ArQxhfOy8kIb62rPJAqrldYGpax0xSxWFbes596FXklrlJKIPXIdhKuVcSqIFbn-uz1hukOKH5C-uyOqO6HED7S8Rl4</recordid><startdate>20171126</startdate><enddate>20171126</enddate><creator>Zhang, Xi</creator><creator>Zhou, Siyu</creator><creator>Feng, Jiashi</creator><creator>Lai, Hanjiang</creator><creator>Li, Bo</creator><creator>Pan, Yan</creator><creator>Yin, Jian</creator><creator>Yan, Shuicheng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20171126</creationdate><title>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</title><author>Zhang, Xi ; Zhou, Siyu ; Feng, Jiashi ; Lai, Hanjiang ; Li, Bo ; Pan, Yan ; Yin, Jian ; Yan, Shuicheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-bd42d3678b9d04e3865667e4426208e22180c1dbfc5487554eece16f3b957b5f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Xi</creatorcontrib><creatorcontrib>Zhou, Siyu</creatorcontrib><creatorcontrib>Feng, Jiashi</creatorcontrib><creatorcontrib>Lai, Hanjiang</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Pan, Yan</creatorcontrib><creatorcontrib>Yin, Jian</creatorcontrib><creatorcontrib>Yan, Shuicheng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Xi</au><au>Zhou, Siyu</au><au>Feng, Jiashi</au><au>Lai, Hanjiang</au><au>Li, Bo</au><au>Pan, Yan</au><au>Yin, Jian</au><au>Yan, Shuicheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval</atitle><date>2017-11-26</date><risdate>2017</risdate><abstract>As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.</abstract><doi>10.48550/arxiv.1711.09347</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1711.09347
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1711_09347
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T21%3A57%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HashGAN:Attention-aware%20Deep%20Adversarial%20Hashing%20for%20Cross%20Modal%20Retrieval&rft.au=Zhang,%20Xi&rft.date=2017-11-26&rft_id=info:doi/10.48550/arxiv.1711.09347&rft_dat=%3Carxiv_GOX%3E1711_09347%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true