Interpretable Foreground Object Search As Knowledge Distillation

This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Boren, Zhuang, Po-Yu, Gu, Jian, Li, Mingyang, Tan, Ping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Li, Boren
Zhuang, Po-Yu
Gu, Jian
Li, Mingyang
Tan, Ping
description This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any query input interchangeably. These instances are referred to as interchangeable foregrounds. We first present a pipeline to build pattern-level FoS dataset containing labels of interchangeable foregrounds. We then establish a benchmark dataset for further training and testing following the pipeline. As for the proposed method, we first train a foreground encoder to learn representations of interchangeable foregrounds. We then train a query encoder to learn query-foreground compatibility following a knowledge distillation framework. It aims to transfer knowledge from interchangeable foregrounds to supervise representation learning of compatibility. The query feature representation is projected to the same latent space as interchangeable foregrounds, enabling very efficient and interpretable instance-level search. Furthermore, pattern-level search is feasible to retrieve more controllable, reasonable and diverse foregrounds. The proposed method outperforms the previous state-of-the-art by 10.42% in absolute difference and 24.06% in relative improvement evaluated by mean average precision (mAP). Extensive experimental results also demonstrate its efficacy from various aspects. The benchmark dataset and code will be release shortly.
doi_str_mv 10.48550/arxiv.2007.09867
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2007_09867</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2007_09867</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-1198157be08be8d423965a27e768357f92f72ba2fd5ba95eb54503eab43016db3</originalsourceid><addsrcrecordid>eNotz7tuwjAUgGEvHSroA3SqXyCpL_FtA9HSoiIxlD06xidg5CbIcW9vXwGd_u2XPkLuOasbqxR7hPwTv2rBmKmZs9rcktmqL5hPGQv4hHQ5ZNzn4bMPdOOPuCv0HSHvDnQ-0rd--E4Y9kif4lhiSlDi0E_JTQdpxLv_Tsh2-bxdvFbrzctqMV9XoI2pOHeWK-ORWY82NEI6rUAYNNpKZTonOiM8iC4oD06hV41iEsE3knEdvJyQh-v2QmhPOX5A_m3PlPZCkX8QU0Oe</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Interpretable Foreground Object Search As Knowledge Distillation</title><source>arXiv.org</source><creator>Li, Boren ; Zhuang, Po-Yu ; Gu, Jian ; Li, Mingyang ; Tan, Ping</creator><creatorcontrib>Li, Boren ; Zhuang, Po-Yu ; Gu, Jian ; Li, Mingyang ; Tan, Ping</creatorcontrib><description>This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any query input interchangeably. These instances are referred to as interchangeable foregrounds. We first present a pipeline to build pattern-level FoS dataset containing labels of interchangeable foregrounds. We then establish a benchmark dataset for further training and testing following the pipeline. As for the proposed method, we first train a foreground encoder to learn representations of interchangeable foregrounds. We then train a query encoder to learn query-foreground compatibility following a knowledge distillation framework. It aims to transfer knowledge from interchangeable foregrounds to supervise representation learning of compatibility. The query feature representation is projected to the same latent space as interchangeable foregrounds, enabling very efficient and interpretable instance-level search. Furthermore, pattern-level search is feasible to retrieve more controllable, reasonable and diverse foregrounds. The proposed method outperforms the previous state-of-the-art by 10.42% in absolute difference and 24.06% in relative improvement evaluated by mean average precision (mAP). Extensive experimental results also demonstrate its efficacy from various aspects. The benchmark dataset and code will be release shortly.</description><identifier>DOI: 10.48550/arxiv.2007.09867</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2020-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2007.09867$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2007.09867$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Boren</creatorcontrib><creatorcontrib>Zhuang, Po-Yu</creatorcontrib><creatorcontrib>Gu, Jian</creatorcontrib><creatorcontrib>Li, Mingyang</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><title>Interpretable Foreground Object Search As Knowledge Distillation</title><description>This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any query input interchangeably. These instances are referred to as interchangeable foregrounds. We first present a pipeline to build pattern-level FoS dataset containing labels of interchangeable foregrounds. We then establish a benchmark dataset for further training and testing following the pipeline. As for the proposed method, we first train a foreground encoder to learn representations of interchangeable foregrounds. We then train a query encoder to learn query-foreground compatibility following a knowledge distillation framework. It aims to transfer knowledge from interchangeable foregrounds to supervise representation learning of compatibility. The query feature representation is projected to the same latent space as interchangeable foregrounds, enabling very efficient and interpretable instance-level search. Furthermore, pattern-level search is feasible to retrieve more controllable, reasonable and diverse foregrounds. The proposed method outperforms the previous state-of-the-art by 10.42% in absolute difference and 24.06% in relative improvement evaluated by mean average precision (mAP). Extensive experimental results also demonstrate its efficacy from various aspects. The benchmark dataset and code will be release shortly.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7tuwjAUgGEvHSroA3SqXyCpL_FtA9HSoiIxlD06xidg5CbIcW9vXwGd_u2XPkLuOasbqxR7hPwTv2rBmKmZs9rcktmqL5hPGQv4hHQ5ZNzn4bMPdOOPuCv0HSHvDnQ-0rd--E4Y9kif4lhiSlDi0E_JTQdpxLv_Tsh2-bxdvFbrzctqMV9XoI2pOHeWK-ORWY82NEI6rUAYNNpKZTonOiM8iC4oD06hV41iEsE3knEdvJyQh-v2QmhPOX5A_m3PlPZCkX8QU0Oe</recordid><startdate>20200719</startdate><enddate>20200719</enddate><creator>Li, Boren</creator><creator>Zhuang, Po-Yu</creator><creator>Gu, Jian</creator><creator>Li, Mingyang</creator><creator>Tan, Ping</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200719</creationdate><title>Interpretable Foreground Object Search As Knowledge Distillation</title><author>Li, Boren ; Zhuang, Po-Yu ; Gu, Jian ; Li, Mingyang ; Tan, Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-1198157be08be8d423965a27e768357f92f72ba2fd5ba95eb54503eab43016db3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Boren</creatorcontrib><creatorcontrib>Zhuang, Po-Yu</creatorcontrib><creatorcontrib>Gu, Jian</creatorcontrib><creatorcontrib>Li, Mingyang</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Boren</au><au>Zhuang, Po-Yu</au><au>Gu, Jian</au><au>Li, Mingyang</au><au>Tan, Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpretable Foreground Object Search As Knowledge Distillation</atitle><date>2020-07-19</date><risdate>2020</risdate><abstract>This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any query input interchangeably. These instances are referred to as interchangeable foregrounds. We first present a pipeline to build pattern-level FoS dataset containing labels of interchangeable foregrounds. We then establish a benchmark dataset for further training and testing following the pipeline. As for the proposed method, we first train a foreground encoder to learn representations of interchangeable foregrounds. We then train a query encoder to learn query-foreground compatibility following a knowledge distillation framework. It aims to transfer knowledge from interchangeable foregrounds to supervise representation learning of compatibility. The query feature representation is projected to the same latent space as interchangeable foregrounds, enabling very efficient and interpretable instance-level search. Furthermore, pattern-level search is feasible to retrieve more controllable, reasonable and diverse foregrounds. The proposed method outperforms the previous state-of-the-art by 10.42% in absolute difference and 24.06% in relative improvement evaluated by mean average precision (mAP). Extensive experimental results also demonstrate its efficacy from various aspects. The benchmark dataset and code will be release shortly.</abstract><doi>10.48550/arxiv.2007.09867</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2007.09867
ispartof
issn
language eng
recordid cdi_arxiv_primary_2007_09867
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Interpretable Foreground Object Search As Knowledge Distillation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A16%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpretable%20Foreground%20Object%20Search%20As%20Knowledge%20Distillation&rft.au=Li,%20Boren&rft.date=2020-07-19&rft_id=info:doi/10.48550/arxiv.2007.09867&rft_dat=%3Carxiv_GOX%3E2007_09867%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true