Towards Omni-supervised Referring Expression Segmentation
Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omn...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-11 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Huang, Minglang Zhou, Yiyi Luo, Gen Jiang, Guannan Zhuang, Weilin Sun, Xiaoshuai |
description | Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2885375745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2885375745</sourcerecordid><originalsourceid>FETCH-proquest_journals_28853757453</originalsourceid><addsrcrecordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2885375745</pqid></control><display><type>article</type><title>Towards Omni-supervised Referring Expression Segmentation</title><source>Free E- Journals</source><creator>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creator><creatorcontrib>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</creatorcontrib><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cognitive tasks ; Computer vision ; Image segmentation ; Labels ; Learning ; Teachers ; Training</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><title>Towards Omni-supervised Referring Expression Segmentation</title><title>arXiv.org</title><description>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</description><subject>Cognitive tasks</subject><subject>Computer vision</subject><subject>Image segmentation</subject><subject>Labels</subject><subject>Learning</subject><subject>Teachers</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNitEKgjAUQEcQJOU_DHoe2ObUnsPoLSjfZeBVJrnZvVv1-fnQB_R0OJyzYolU6iCqXMoNS4nGLMtkUUqtVcKOjX8b7IhfJ2cFxRnwZQk6foMeEK0beP2ZEYisd_wOwwQumLDIjq178yBIf9yy_bluThcxo39GoNCOPqJbUiurSqtSl7lW_11f4ZE3Wg</recordid><startdate>20231127</startdate><enddate>20231127</enddate><creator>Huang, Minglang</creator><creator>Zhou, Yiyi</creator><creator>Luo, Gen</creator><creator>Jiang, Guannan</creator><creator>Zhuang, Weilin</creator><creator>Sun, Xiaoshuai</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231127</creationdate><title>Towards Omni-supervised Referring Expression Segmentation</title><author>Huang, Minglang ; Zhou, Yiyi ; Luo, Gen ; Jiang, Guannan ; Zhuang, Weilin ; Sun, Xiaoshuai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28853757453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cognitive tasks</topic><topic>Computer vision</topic><topic>Image segmentation</topic><topic>Labels</topic><topic>Learning</topic><topic>Teachers</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Minglang</creatorcontrib><creatorcontrib>Zhou, Yiyi</creatorcontrib><creatorcontrib>Luo, Gen</creatorcontrib><creatorcontrib>Jiang, Guannan</creatorcontrib><creatorcontrib>Zhuang, Weilin</creatorcontrib><creatorcontrib>Sun, Xiaoshuai</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Minglang</au><au>Zhou, Yiyi</au><au>Luo, Gen</au><au>Jiang, Guannan</au><au>Zhuang, Weilin</au><au>Sun, Xiaoshuai</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Towards Omni-supervised Referring Expression Segmentation</atitle><jtitle>arXiv.org</jtitle><date>2023-11-27</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2885375745 |
source | Free E- Journals |
subjects | Cognitive tasks Computer vision Image segmentation Labels Learning Teachers Training |
title | Towards Omni-supervised Referring Expression Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A09%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Towards%20Omni-supervised%20Referring%20Expression%20Segmentation&rft.jtitle=arXiv.org&rft.au=Huang,%20Minglang&rft.date=2023-11-27&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2885375745%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2885375745&rft_id=info:pmid/&rfr_iscdi=true |