DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis

Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it req...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.7844-7855
Hauptverfasser:	Zhou, Wujie, Jian, Bitao, Fang, Meixin, Dong, Xiena, Liu, Yuanyuan, Jiang, Qiuping
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Availability branch attention Circuits and systems Computational modeling Datasets Decoding depth guidance Depth measurement Feature extraction Image segmentation Indoor environment indoor scene analysis Knowledge discovery knowledge distillation Logic gates Parameters RGB-D data Scene analysis Semantic segmentation Semantics Source code Spatial data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7855
container_issue	9
container_start_page	7844
container_title	IEEE transactions on circuits and systems for video technology
container_volume	34
creator	Zhou, Wujie Jian, Bitao Fang, Meixin Dong, Xiena Liu, Yuanyuan Jiang, Qiuping
description	Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .
doi_str_mv	10.1109/TCSVT.2024.3382354
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10480703</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10480703</ieee_id><sourcerecordid>3112218120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</originalsourceid><addsrcrecordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3112218120</pqid></control><display><type>article</type><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creator><creatorcontrib>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creatorcontrib><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3382354</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Availability ; branch attention ; Circuits and systems ; Computational modeling ; Datasets ; Decoding ; depth guidance ; Depth measurement ; Feature extraction ; Image segmentation ; Indoor environment ; indoor scene analysis ; Knowledge discovery ; knowledge distillation ; Logic gates ; Parameters ; RGB-D data ; Scene analysis ; Semantic segmentation ; Semantics ; Source code ; Spatial data</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</citedby><cites>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</cites><orcidid>0009-0006-1327-4416 ; 0000-0002-3055-2493 ; 0000-0002-6025-9343 ; 0000-0003-0465-3976</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54737</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><subject>Accuracy</subject><subject>Availability</subject><subject>branch attention</subject><subject>Circuits and systems</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Decoding</subject><subject>depth guidance</subject><subject>Depth measurement</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>Indoor environment</subject><subject>indoor scene analysis</subject><subject>Knowledge discovery</subject><subject>knowledge distillation</subject><subject>Logic gates</subject><subject>Parameters</subject><subject>RGB-D data</subject><subject>Scene analysis</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Source code</subject><subject>Spatial data</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Zhou, Wujie</creator><creator>Jian, Bitao</creator><creator>Fang, Meixin</creator><creator>Dong, Xiena</creator><creator>Liu, Yuanyuan</creator><creator>Jiang, Qiuping</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></search><sort><creationdate>202409</creationdate><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><author>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Availability</topic><topic>branch attention</topic><topic>Circuits and systems</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Decoding</topic><topic>depth guidance</topic><topic>Depth measurement</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>Indoor environment</topic><topic>indoor scene analysis</topic><topic>Knowledge discovery</topic><topic>knowledge distillation</topic><topic>Logic gates</topic><topic>Parameters</topic><topic>RGB-D data</topic><topic>Scene analysis</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Source code</topic><topic>Spatial data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Wujie</au><au>Jian, Bitao</au><au>Fang, Meixin</au><au>Dong, Xiena</au><au>Liu, Yuanyuan</au><au>Jiang, Qiuping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-09</date><risdate>2024</risdate><volume>34</volume><issue>9</issue><spage>7844</spage><epage>7855</epage><pages>7844-7855</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3382354</doi><tpages>12</tpages><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855
issn	1051-8215 1558-2205
language	eng
recordid	cdi_ieee_primary_10480703
source	IEEE Electronic Library (IEL)
subjects	Accuracy Availability branch attention Circuits and systems Computational modeling Datasets Decoding depth guidance Depth measurement Feature extraction Image segmentation Indoor environment indoor scene analysis Knowledge discovery knowledge distillation Logic gates Parameters RGB-D data Scene analysis Semantic segmentation Semantics Source code Spatial data
title	DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T03%3A37%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DGPINet-KD:%20Deep%20Guided%20and%20Progressive%20Integration%20Network%20With%20Knowledge%20Distillation%20for%20RGB-D%20Indoor%20Scene%20Analysis&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Zhou,%20Wujie&rft.date=2024-09&rft.volume=34&rft.issue=9&rft.spage=7844&rft.epage=7855&rft.pages=7844-7855&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3382354&rft_dat=%3Cproquest_RIE%3E3112218120%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3112218120&rft_id=info:pmid/&rft_ieee_id=10480703&rfr_iscdi=true