Towards Omni-supervised Referring Expression Segmentation

Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omn...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-11
Hauptverfasser:	Huang, Minglang, Zhou, Yiyi, Luo, Gen, Jiang, Guannan, Zhuang, Weilin, Sun, Xiaoshuai
Format:	Artikel
Sprache:	eng
Schlagworte:	Cognitive tasks Computer vision Image segmentation Labels Learning Teachers Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Huang, Minglang Zhou, Yiyi Luo, Gen Jiang, Guannan Zhuang, Weilin Sun, Xiaoshuai
description	Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2885375745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2885375745</sourcerecordid><originalsourceid>FETCH-proquest_journals_28853757453</originalsourceid><addsrcrecordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2885375745</pqid></control><display><type>article</type><title>Towards Omni-supervised Referring Expression Segmentation</title><source>Free E- Journals</source><creator>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creator><creatorcontrib>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creatorcontrib><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cognitive tasks ; Computer vision ; Image segmentation ; Labels ; Learning ; Teachers ; Training</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><title>Towards Omni-supervised Referring Expression Segmentation</title><title>arXiv.org</title><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><subject>Cognitive tasks</subject><subject>Computer vision</subject><subject>Image segmentation</subject><subject>Labels</subject><subject>Learning</subject><subject>Teachers</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</recordid><startdate>20231127</startdate><enddate>20231127</enddate><creator>Huang, Minglang</creator><creator>Zhou, Yiyi</creator><creator>Luo, Gen</creator><creator>Jiang, Guannan</creator><creator>Zhuang, Weilin</creator><creator>Sun, Xiaoshuai</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231127</creationdate><title>Towards Omni-supervised Referring Expression Segmentation</title><author>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28853757453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cognitive tasks</topic><topic>Computer vision</topic><topic>Image segmentation</topic><topic>Labels</topic><topic>Learning</topic><topic>Teachers</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Minglang</au><au>Zhou, Yiyi</au><au>Luo, Gen</au><au>Jiang, Guannan</au><au>Zhuang, Weilin</au><au>Sun, Xiaoshuai</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Towards Omni-supervised Referring Expression Segmentation</atitle><jtitle>arXiv.org</jtitle><date>2023-11-27</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2885375745
source	Free E- Journals
subjects	Cognitive tasks Computer vision Image segmentation Labels Learning Teachers Training
title	Towards Omni-supervised Referring Expression Segmentation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A09%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Towards%20Omni-supervised%20Referring%20Expression%20Segmentation&rft.jtitle=arXiv.org&rft.au=Huang,%20Minglang&rft.date=2023-11-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2885375745%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2885375745&rft_id=info:pmid/&rfr_iscdi=true