Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-07
Hauptverfasser: Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Yu, Seonghoon
Seo, Paul Hongsuck
Son, Jeany
description We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging their broad generalization capabilities. However, the naive incorporation of these models may generate non-distinctive expressions that do not distinctively refer to the target masks. To address this challenge, we propose two-fold strategies that generate distinctive captions: 1) 'distinctive caption sampling', a new decoding method for the captioning model, to generate multiple expression candidates with detailed words focusing on the target. 2) 'distinctiveness-based text filtering' to further validate the candidates and filter out those with a low level of distinctiveness. These two strategies ensure that the generated text supervisions can distinguish the target from other objects, making them appropriate for the RIS annotations. Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets. It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS. Furthermore, integrating our method with human annotations yields further improvements, highlighting its potential in semi-supervised learning applications.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3078831327</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3078831327</sourcerecordid><originalsourceid>FETCH-proquest_journals_30788313273</originalsourceid><addsrcrecordid>eNqNis0KgkAURocgSMp3GGgtjHMzpW2_7sKirUhdZSRnbO6Mz98PPkCr83G-M2GBBIijbCXljIVErRBCrlOZJBCw25nQP0xU5JcN3ylySt-dGpCPnnyPdlCkjOZH1Ggr9521sbzAGq1VuuF5VzXIL9h0qN0vWLBpXT0Jw5Fztjzsr9tT1Fvz8kiubI23-nOVINIsgxhkCv9VbykoQYY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3078831327</pqid></control><display><type>article</type><title>Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation</title><source>Free E- Journals</source><creator>Yu, Seonghoon ; Seo, Paul Hongsuck ; Son, Jeany</creator><creatorcontrib>Yu, Seonghoon ; Seo, Paul Hongsuck ; Son, Jeany</creatorcontrib><description>We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging their broad generalization capabilities. However, the naive incorporation of these models may generate non-distinctive expressions that do not distinctively refer to the target masks. To address this challenge, we propose two-fold strategies that generate distinctive captions: 1) 'distinctive caption sampling', a new decoding method for the captioning model, to generate multiple expression candidates with detailed words focusing on the target. 2) 'distinctiveness-based text filtering' to further validate the candidates and filter out those with a low level of distinctiveness. These two strategies ensure that the generated text supervisions can distinguish the target from other objects, making them appropriate for the RIS annotations. Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets. It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS. Furthermore, integrating our method with human annotations yields further improvements, highlighting its potential in semi-supervised learning applications.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Image annotation ; Image quality ; Image segmentation ; Low level ; Masks ; Semi-supervised learning ; Target masking</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Yu, Seonghoon</creatorcontrib><creatorcontrib>Seo, Paul Hongsuck</creatorcontrib><creatorcontrib>Son, Jeany</creatorcontrib><title>Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation</title><title>arXiv.org</title><description>We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging their broad generalization capabilities. However, the naive incorporation of these models may generate non-distinctive expressions that do not distinctively refer to the target masks. To address this challenge, we propose two-fold strategies that generate distinctive captions: 1) 'distinctive caption sampling', a new decoding method for the captioning model, to generate multiple expression candidates with detailed words focusing on the target. 2) 'distinctiveness-based text filtering' to further validate the candidates and filter out those with a low level of distinctiveness. These two strategies ensure that the generated text supervisions can distinguish the target from other objects, making them appropriate for the RIS annotations. Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets. It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS. Furthermore, integrating our method with human annotations yields further improvements, highlighting its potential in semi-supervised learning applications.</description><subject>Image annotation</subject><subject>Image quality</subject><subject>Image segmentation</subject><subject>Low level</subject><subject>Masks</subject><subject>Semi-supervised learning</subject><subject>Target masking</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNis0KgkAURocgSMp3GGgtjHMzpW2_7sKirUhdZSRnbO6Mz98PPkCr83G-M2GBBIijbCXljIVErRBCrlOZJBCw25nQP0xU5JcN3ylySt-dGpCPnnyPdlCkjOZH1Ggr9521sbzAGq1VuuF5VzXIL9h0qN0vWLBpXT0Jw5Fztjzsr9tT1Fvz8kiubI23-nOVINIsgxhkCv9VbykoQYY</recordid><startdate>20240717</startdate><enddate>20240717</enddate><creator>Yu, Seonghoon</creator><creator>Seo, Paul Hongsuck</creator><creator>Son, Jeany</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240717</creationdate><title>Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation</title><author>Yu, Seonghoon ; Seo, Paul Hongsuck ; Son, Jeany</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30788313273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Image annotation</topic><topic>Image quality</topic><topic>Image segmentation</topic><topic>Low level</topic><topic>Masks</topic><topic>Semi-supervised learning</topic><topic>Target masking</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Seonghoon</creatorcontrib><creatorcontrib>Seo, Paul Hongsuck</creatorcontrib><creatorcontrib>Son, Jeany</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Seonghoon</au><au>Seo, Paul Hongsuck</au><au>Son, Jeany</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation</atitle><jtitle>arXiv.org</jtitle><date>2024-07-17</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging their broad generalization capabilities. However, the naive incorporation of these models may generate non-distinctive expressions that do not distinctively refer to the target masks. To address this challenge, we propose two-fold strategies that generate distinctive captions: 1) 'distinctive caption sampling', a new decoding method for the captioning model, to generate multiple expression candidates with detailed words focusing on the target. 2) 'distinctiveness-based text filtering' to further validate the candidates and filter out those with a low level of distinctiveness. These two strategies ensure that the generated text supervisions can distinguish the target from other objects, making them appropriate for the RIS annotations. Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets. It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS. Furthermore, integrating our method with human annotations yields further improvements, highlighting its potential in semi-supervised learning applications.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_3078831327
source Free E- Journals
subjects Image annotation
Image quality
Image segmentation
Low level
Masks
Semi-supervised learning
Target masking
title Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T00%3A39%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Pseudo-RIS:%20Distinctive%20Pseudo-supervision%20Generation%20for%20Referring%20Image%20Segmentation&rft.jtitle=arXiv.org&rft.au=Yu,%20Seonghoon&rft.date=2024-07-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3078831327%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3078831327&rft_id=info:pmid/&rfr_iscdi=true