Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2021-12
Hauptverfasser: Huang, Zongheng, Sun, YiFan, Han, Chuchu, Gao, Changxin, Sang, Nong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Huang, Zongheng
Sun, YiFan
Han, Chuchu
Gao, Changxin
Sang, Nong
description This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2611011035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2611011035</sourcerecordid><originalsourceid>FETCH-proquest_journals_26110110353</originalsourceid><addsrcrecordid>eNqNikELgjAYQEcQJOV_-KDzYG5pXSsKO3gpT11k5KfOlrNtFv37PPQDggfv8N6EBFyIiG5WnM9I6FzLGOPJmsexCEiWmVJq5T90-5YWIbeq1-ghlbaETHWqq6EyFq5oDXWN8XC5o781dCcdlnB6yBrhjN4qfEm9INNKaofhz3OyPB7yfUp7a54DOl-0ZrDdmAqeRBEbEbH47_oCLsY88A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2611011035</pqid></control><display><type>article</type><title>Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval</title><source>Freely Accessible Journals</source><creator>Huang, Zongheng ; Sun, YiFan ; Han, Chuchu ; Gao, Changxin ; Sang, Nong</creator><creatorcontrib>Huang, Zongheng ; Sun, YiFan ; Han, Chuchu ; Gao, Changxin ; Sang, Nong</creatorcontrib><description>This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bells ; Image management ; Image retrieval ; Learning ; Metric space ; Training ; Weighting methods</subject><ispartof>arXiv.org, 2021-12</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Huang, Zongheng</creatorcontrib><creatorcontrib>Sun, YiFan</creatorcontrib><creatorcontrib>Han, Chuchu</creatorcontrib><creatorcontrib>Gao, Changxin</creatorcontrib><creatorcontrib>Sang, Nong</creatorcontrib><title>Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval</title><title>arXiv.org</title><description>This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM.</description><subject>Bells</subject><subject>Image management</subject><subject>Image retrieval</subject><subject>Learning</subject><subject>Metric space</subject><subject>Training</subject><subject>Weighting methods</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNikELgjAYQEcQJOV_-KDzYG5pXSsKO3gpT11k5KfOlrNtFv37PPQDggfv8N6EBFyIiG5WnM9I6FzLGOPJmsexCEiWmVJq5T90-5YWIbeq1-ghlbaETHWqq6EyFq5oDXWN8XC5o781dCcdlnB6yBrhjN4qfEm9INNKaofhz3OyPB7yfUp7a54DOl-0ZrDdmAqeRBEbEbH47_oCLsY88A</recordid><startdate>20211216</startdate><enddate>20211216</enddate><creator>Huang, Zongheng</creator><creator>Sun, YiFan</creator><creator>Han, Chuchu</creator><creator>Gao, Changxin</creator><creator>Sang, Nong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20211216</creationdate><title>Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval</title><author>Huang, Zongheng ; Sun, YiFan ; Han, Chuchu ; Gao, Changxin ; Sang, Nong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26110110353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Bells</topic><topic>Image management</topic><topic>Image retrieval</topic><topic>Learning</topic><topic>Metric space</topic><topic>Training</topic><topic>Weighting methods</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zongheng</creatorcontrib><creatorcontrib>Sun, YiFan</creatorcontrib><creatorcontrib>Han, Chuchu</creatorcontrib><creatorcontrib>Gao, Changxin</creatorcontrib><creatorcontrib>Sang, Nong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Zongheng</au><au>Sun, YiFan</au><au>Han, Chuchu</au><au>Gao, Changxin</au><au>Sang, Nong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval</atitle><jtitle>arXiv.org</jtitle><date>2021-12-16</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_2611011035
source Freely Accessible Journals
subjects Bells
Image management
Image retrieval
Learning
Metric space
Training
Weighting methods
title Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T10%3A06%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Modality-Aware%20Triplet%20Hard%20Mining%20for%20Zero-shot%20Sketch-Based%20Image%20Retrieval&rft.jtitle=arXiv.org&rft.au=Huang,%20Zongheng&rft.date=2021-12-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2611011035%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2611011035&rft_id=info:pmid/&rfr_iscdi=true