DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis

Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it req...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.7844-7855
Hauptverfasser: Zhou, Wujie, Jian, Bitao, Fang, Meixin, Dong, Xiena, Liu, Yuanyuan, Jiang, Qiuping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 7855
container_issue 9
container_start_page 7844
container_title IEEE transactions on circuits and systems for video technology
container_volume 34
creator Zhou, Wujie
Jian, Bitao
Fang, Meixin
Dong, Xiena
Liu, Yuanyuan
Jiang, Qiuping
description Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .
doi_str_mv 10.1109/TCSVT.2024.3382354
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10480703</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10480703</ieee_id><sourcerecordid>3112218120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</originalsourceid><addsrcrecordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3112218120</pqid></control><display><type>article</type><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creator><creatorcontrib>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creatorcontrib><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3382354</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Availability ; branch attention ; Circuits and systems ; Computational modeling ; Datasets ; Decoding ; depth guidance ; Depth measurement ; Feature extraction ; Image segmentation ; Indoor environment ; indoor scene analysis ; Knowledge discovery ; knowledge distillation ; Logic gates ; Parameters ; RGB-D data ; Scene analysis ; Semantic segmentation ; Semantics ; Source code ; Spatial data</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</citedby><cites>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</cites><orcidid>0009-0006-1327-4416 ; 0000-0002-3055-2493 ; 0000-0002-6025-9343 ; 0000-0003-0465-3976</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54737</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><subject>Accuracy</subject><subject>Availability</subject><subject>branch attention</subject><subject>Circuits and systems</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Decoding</subject><subject>depth guidance</subject><subject>Depth measurement</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>Indoor environment</subject><subject>indoor scene analysis</subject><subject>Knowledge discovery</subject><subject>knowledge distillation</subject><subject>Logic gates</subject><subject>Parameters</subject><subject>RGB-D data</subject><subject>Scene analysis</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Source code</subject><subject>Spatial data</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Zhou, Wujie</creator><creator>Jian, Bitao</creator><creator>Fang, Meixin</creator><creator>Dong, Xiena</creator><creator>Liu, Yuanyuan</creator><creator>Jiang, Qiuping</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></search><sort><creationdate>202409</creationdate><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><author>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Availability</topic><topic>branch attention</topic><topic>Circuits and systems</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Decoding</topic><topic>depth guidance</topic><topic>Depth measurement</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>Indoor environment</topic><topic>indoor scene analysis</topic><topic>Knowledge discovery</topic><topic>knowledge distillation</topic><topic>Logic gates</topic><topic>Parameters</topic><topic>RGB-D data</topic><topic>Scene analysis</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Source code</topic><topic>Spatial data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Wujie</au><au>Jian, Bitao</au><au>Fang, Meixin</au><au>Dong, Xiena</au><au>Liu, Yuanyuan</au><au>Jiang, Qiuping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-09</date><risdate>2024</risdate><volume>34</volume><issue>9</issue><spage>7844</spage><epage>7855</epage><pages>7844-7855</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3382354</doi><tpages>12</tpages><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_10480703
source IEEE Electronic Library (IEL)
subjects Accuracy
Availability
branch attention
Circuits and systems
Computational modeling
Datasets
Decoding
depth guidance
Depth measurement
Feature extraction
Image segmentation
Indoor environment
indoor scene analysis
Knowledge discovery
knowledge distillation
Logic gates
Parameters
RGB-D data
Scene analysis
Semantic segmentation
Semantics
Source code
Spatial data
title DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T03%3A37%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DGPINet-KD:%20Deep%20Guided%20and%20Progressive%20Integration%20Network%20With%20Knowledge%20Distillation%20for%20RGB-D%20Indoor%20Scene%20Analysis&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Zhou,%20Wujie&rft.date=2024-09&rft.volume=34&rft.issue=9&rft.spage=7844&rft.epage=7855&rft.pages=7844-7855&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3382354&rft_dat=%3Cproquest_RIE%3E3112218120%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3112218120&rft_id=info:pmid/&rft_ieee_id=10480703&rfr_iscdi=true