Towards Omni-supervised Referring Expression Segmentation

Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omn...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-11
Hauptverfasser: Huang, Minglang, Zhou, Yiyi, Luo, Gen, Jiang, Guannan, Zhuang, Weilin, Sun, Xiaoshuai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Huang, Minglang
Zhou, Yiyi
Luo, Gen
Jiang, Guannan
Zhuang, Weilin
Sun, Xiaoshuai
description Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2885375745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2885375745</sourcerecordid><originalsourceid>FETCH-proquest_journals_28853757453</originalsourceid><addsrcrecordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2885375745</pqid></control><display><type>article</type><title>Towards Omni-supervised Referring Expression Segmentation</title><source>Free E- Journals</source><creator>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creator><creatorcontrib>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creatorcontrib><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cognitive tasks ; Computer vision ; Image segmentation ; Labels ; Learning ; Teachers ; Training</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><title>Towards Omni-supervised Referring Expression Segmentation</title><title>arXiv.org</title><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><subject>Cognitive tasks</subject><subject>Computer vision</subject><subject>Image segmentation</subject><subject>Labels</subject><subject>Learning</subject><subject>Teachers</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</recordid><startdate>20231127</startdate><enddate>20231127</enddate><creator>Huang, Minglang</creator><creator>Zhou, Yiyi</creator><creator>Luo, Gen</creator><creator>Jiang, Guannan</creator><creator>Zhuang, Weilin</creator><creator>Sun, Xiaoshuai</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231127</creationdate><title>Towards Omni-supervised Referring Expression Segmentation</title><author>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28853757453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cognitive tasks</topic><topic>Computer vision</topic><topic>Image segmentation</topic><topic>Labels</topic><topic>Learning</topic><topic>Teachers</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Minglang</au><au>Zhou, Yiyi</au><au>Luo, Gen</au><au>Jiang, Guannan</au><au>Zhuang, Weilin</au><au>Sun, Xiaoshuai</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Towards Omni-supervised Referring Expression Segmentation</atitle><jtitle>arXiv.org</jtitle><date>2023-11-27</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2885375745
source Free E- Journals
subjects Cognitive tasks
Computer vision
Image segmentation
Labels
Learning
Teachers
Training
title Towards Omni-supervised Referring Expression Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A09%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Towards%20Omni-supervised%20Referring%20Expression%20Segmentation&rft.jtitle=arXiv.org&rft.au=Huang,%20Minglang&rft.date=2023-11-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2885375745%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2885375745&rft_id=info:pmid/&rfr_iscdi=true