Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation

Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-08, Vol.34 (8), p.7017-7028
Hauptverfasser: Qin, Zhen, Chen, Yujie, Zhu, Guosong, Zhou, Erqiang, Zhou, Yingjie, Zhou, Yicong, Zhu, Ce
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 7028
container_issue 8
container_start_page 7017
container_title IEEE transactions on circuits and systems for video technology
container_volume 34
creator Qin, Zhen
Chen, Yujie
Zhu, Guosong
Zhou, Erqiang
Zhou, Yingjie
Zhou, Yicong
Zhu, Ce
description Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a common problem is that the generated pseudo-labels contain insufficient semantic information, resulting in poor accuracy. To address this challenge, a novel method is proposed, which generates class activation/attention maps (CAMs) containing sufficient semantic information as pseudo-labels for the semantic segmentation training without pixel-level labels. In this method, the attention-transfer module is designed to preserve salient regions on CAMs while avoiding the suppression of inconspicuous regions of the targets, which results in the generation of pseudo-labels with sufficient semantic information. A pixel relevance focused-unfocused module has also been developed for better integrating contextual information, with both attention mechanisms employed to extract focused relevant pixels and multi-scale atrous convolution employed to expand receptive field for establishing distant pixel connections. The proposed method has been experimentally demonstrated to achieve competitive performance in weakly-supervised segmentation, and even outperforms many saliency-joined methods.
doi_str_mv 10.1109/TCSVT.2024.3364764
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10431742</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10431742</ieee_id><sourcerecordid>10_1109_TCSVT_2024_3364764</sourcerecordid><originalsourceid>FETCH-LOGICAL-c268t-ab39f9728ef409b7c6035966bca71afade772fed6d597660263f2d298ec8f8063</originalsourceid><addsrcrecordid>eNpNkMlOwzAURS0EEqXwA4iFf8DFU2xniaoySJVASqDLyEmeW9PUqZwUqX9POiy6eld699zFQeiR0QljNH3Op9lPPuGUy4kQSmolr9CIJYkhnNPkesg0YcRwltyiu677pZRJI_UIrWdhZUMFNf7qYFe3ZG5LaPAbBIi2923AC9-vcAaNI9luC_HPd0M5j9YHH5bYtREvwK6bPcEX_ww2NvS-GsJyA6E_Tt2jG2ebDh7Od4y-X2f59J3MP98-pi9zUnFlemJLkbpUcwNO0rTUlaIiSZUqK6uZdbYGrbmDWtVJqpWiXAnHa54aqIwzVIkx4qfdKrZdF8EV2-g3Nu4LRouDruKoqzjoKs66BujpBHkAuACkYFpy8Q_DQ2jX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Qin, Zhen ; Chen, Yujie ; Zhu, Guosong ; Zhou, Erqiang ; Zhou, Yingjie ; Zhou, Yicong ; Zhu, Ce</creator><creatorcontrib>Qin, Zhen ; Chen, Yujie ; Zhu, Guosong ; Zhou, Erqiang ; Zhou, Yingjie ; Zhou, Yicong ; Zhu, Ce</creatorcontrib><description>Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a common problem is that the generated pseudo-labels contain insufficient semantic information, resulting in poor accuracy. To address this challenge, a novel method is proposed, which generates class activation/attention maps (CAMs) containing sufficient semantic information as pseudo-labels for the semantic segmentation training without pixel-level labels. In this method, the attention-transfer module is designed to preserve salient regions on CAMs while avoiding the suppression of inconspicuous regions of the targets, which results in the generation of pseudo-labels with sufficient semantic information. A pixel relevance focused-unfocused module has also been developed for better integrating contextual information, with both attention mechanisms employed to extract focused relevant pixels and multi-scale atrous convolution employed to expand receptive field for establishing distant pixel connections. The proposed method has been experimentally demonstrated to achieve competitive performance in weakly-supervised segmentation, and even outperforms many saliency-joined methods.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3364764</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>IEEE</publisher><subject>attention transfer mechanism ; Cams ; class attention/activation maps ; Convolution ; Feature extraction ; Semantic segmentation ; Semantics ; Task analysis ; Training ; weakly-supervised learning</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-08, Vol.34 (8), p.7017-7028</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c268t-ab39f9728ef409b7c6035966bca71afade772fed6d597660263f2d298ec8f8063</citedby><cites>FETCH-LOGICAL-c268t-ab39f9728ef409b7c6035966bca71afade772fed6d597660263f2d298ec8f8063</cites><orcidid>0000-0001-7607-707X ; 0000-0002-4487-6384 ; 0009-0009-1908-3313 ; 0000-0002-1129-0213 ; 0000-0001-7857-9719</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10431742$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10431742$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Qin, Zhen</creatorcontrib><creatorcontrib>Chen, Yujie</creatorcontrib><creatorcontrib>Zhu, Guosong</creatorcontrib><creatorcontrib>Zhou, Erqiang</creatorcontrib><creatorcontrib>Zhou, Yingjie</creatorcontrib><creatorcontrib>Zhou, Yicong</creatorcontrib><creatorcontrib>Zhu, Ce</creatorcontrib><title>Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a common problem is that the generated pseudo-labels contain insufficient semantic information, resulting in poor accuracy. To address this challenge, a novel method is proposed, which generates class activation/attention maps (CAMs) containing sufficient semantic information as pseudo-labels for the semantic segmentation training without pixel-level labels. In this method, the attention-transfer module is designed to preserve salient regions on CAMs while avoiding the suppression of inconspicuous regions of the targets, which results in the generation of pseudo-labels with sufficient semantic information. A pixel relevance focused-unfocused module has also been developed for better integrating contextual information, with both attention mechanisms employed to extract focused relevant pixels and multi-scale atrous convolution employed to expand receptive field for establishing distant pixel connections. The proposed method has been experimentally demonstrated to achieve competitive performance in weakly-supervised segmentation, and even outperforms many saliency-joined methods.</description><subject>attention transfer mechanism</subject><subject>Cams</subject><subject>class attention/activation maps</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>weakly-supervised learning</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMlOwzAURS0EEqXwA4iFf8DFU2xniaoySJVASqDLyEmeW9PUqZwUqX9POiy6eld699zFQeiR0QljNH3Op9lPPuGUy4kQSmolr9CIJYkhnNPkesg0YcRwltyiu677pZRJI_UIrWdhZUMFNf7qYFe3ZG5LaPAbBIi2923AC9-vcAaNI9luC_HPd0M5j9YHH5bYtREvwK6bPcEX_ww2NvS-GsJyA6E_Tt2jG2ebDh7Od4y-X2f59J3MP98-pi9zUnFlemJLkbpUcwNO0rTUlaIiSZUqK6uZdbYGrbmDWtVJqpWiXAnHa54aqIwzVIkx4qfdKrZdF8EV2-g3Nu4LRouDruKoqzjoKs66BujpBHkAuACkYFpy8Q_DQ2jX</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Qin, Zhen</creator><creator>Chen, Yujie</creator><creator>Zhu, Guosong</creator><creator>Zhou, Erqiang</creator><creator>Zhou, Yingjie</creator><creator>Zhou, Yicong</creator><creator>Zhu, Ce</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-7607-707X</orcidid><orcidid>https://orcid.org/0000-0002-4487-6384</orcidid><orcidid>https://orcid.org/0009-0009-1908-3313</orcidid><orcidid>https://orcid.org/0000-0002-1129-0213</orcidid><orcidid>https://orcid.org/0000-0001-7857-9719</orcidid></search><sort><creationdate>20240801</creationdate><title>Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation</title><author>Qin, Zhen ; Chen, Yujie ; Zhu, Guosong ; Zhou, Erqiang ; Zhou, Yingjie ; Zhou, Yicong ; Zhu, Ce</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c268t-ab39f9728ef409b7c6035966bca71afade772fed6d597660263f2d298ec8f8063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>attention transfer mechanism</topic><topic>Cams</topic><topic>class attention/activation maps</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>weakly-supervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Zhen</creatorcontrib><creatorcontrib>Chen, Yujie</creatorcontrib><creatorcontrib>Zhu, Guosong</creatorcontrib><creatorcontrib>Zhou, Erqiang</creatorcontrib><creatorcontrib>Zhou, Yingjie</creatorcontrib><creatorcontrib>Zhou, Yicong</creatorcontrib><creatorcontrib>Zhu, Ce</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qin, Zhen</au><au>Chen, Yujie</au><au>Zhu, Guosong</au><au>Zhou, Erqiang</au><au>Zhou, Yingjie</au><au>Zhou, Yicong</au><au>Zhu, Ce</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-08-01</date><risdate>2024</risdate><volume>34</volume><issue>8</issue><spage>7017</spage><epage>7028</epage><pages>7017-7028</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a common problem is that the generated pseudo-labels contain insufficient semantic information, resulting in poor accuracy. To address this challenge, a novel method is proposed, which generates class activation/attention maps (CAMs) containing sufficient semantic information as pseudo-labels for the semantic segmentation training without pixel-level labels. In this method, the attention-transfer module is designed to preserve salient regions on CAMs while avoiding the suppression of inconspicuous regions of the targets, which results in the generation of pseudo-labels with sufficient semantic information. A pixel relevance focused-unfocused module has also been developed for better integrating contextual information, with both attention mechanisms employed to extract focused relevant pixels and multi-scale atrous convolution employed to expand receptive field for establishing distant pixel connections. The proposed method has been experimentally demonstrated to achieve competitive performance in weakly-supervised segmentation, and even outperforms many saliency-joined methods.</abstract><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3364764</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-7607-707X</orcidid><orcidid>https://orcid.org/0000-0002-4487-6384</orcidid><orcidid>https://orcid.org/0009-0009-1908-3313</orcidid><orcidid>https://orcid.org/0000-0002-1129-0213</orcidid><orcidid>https://orcid.org/0000-0001-7857-9719</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2024-08, Vol.34 (8), p.7017-7028
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_10431742
source IEEE/IET Electronic Library (IEL)
subjects attention transfer mechanism
Cams
class attention/activation maps
Convolution
Feature extraction
Semantic segmentation
Semantics
Task analysis
Training
weakly-supervised learning
title Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A15%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhanced%20Pseudo-Label%20Generation%20With%20Self-Supervised%20Training%20for%20Weakly-%20Supervised%20Semantic%20Segmentation&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Qin,%20Zhen&rft.date=2024-08-01&rft.volume=34&rft.issue=8&rft.spage=7017&rft.epage=7028&rft.pages=7017-7028&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3364764&rft_dat=%3Ccrossref_RIE%3E10_1109_TCSVT_2024_3364764%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10431742&rfr_iscdi=true