DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis
Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it req...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.7844-7855 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 7855 |
---|---|
container_issue | 9 |
container_start_page | 7844 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 34 |
creator | Zhou, Wujie Jian, Bitao Fang, Meixin Dong, Xiena Liu, Yuanyuan Jiang, Qiuping |
description | Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet . |
doi_str_mv | 10.1109/TCSVT.2024.3382354 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10480703</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10480703</ieee_id><sourcerecordid>3112218120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</originalsourceid><addsrcrecordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3112218120</pqid></control><display><type>article</type><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creator><creatorcontrib>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</creatorcontrib><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3382354</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Availability ; branch attention ; Circuits and systems ; Computational modeling ; Datasets ; Decoding ; depth guidance ; Depth measurement ; Feature extraction ; Image segmentation ; Indoor environment ; indoor scene analysis ; Knowledge discovery ; knowledge distillation ; Logic gates ; Parameters ; RGB-D data ; Scene analysis ; Semantic segmentation ; Semantics ; Source code ; Spatial data</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</citedby><cites>FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</cites><orcidid>0009-0006-1327-4416 ; 0000-0002-3055-2493 ; 0000-0002-6025-9343 ; 0000-0003-0465-3976</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54737</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10480703$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</description><subject>Accuracy</subject><subject>Availability</subject><subject>branch attention</subject><subject>Circuits and systems</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Decoding</subject><subject>depth guidance</subject><subject>Depth measurement</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>Indoor environment</subject><subject>indoor scene analysis</subject><subject>Knowledge discovery</subject><subject>knowledge distillation</subject><subject>Logic gates</subject><subject>Parameters</subject><subject>RGB-D data</subject><subject>Scene analysis</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Source code</subject><subject>Spatial data</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkF1PwjAUQBejiYj-AeNDE5-Hve26D9-QKRKIEkF9XLrtFotzxXZI-PcO4cGn3ibn3Nwcz7sE2gOgyc18MHub9xhlQY_zmHERHHkdECL2GaPiuJ2pAD9mIE69M-eWlEIQB1HH26TD6egJG3-c3pIUcUWGa11iSWRdkqk1C4vO6R8ko7rBhZWNNjVp-Y2xn-RdNx9kXJtNheUCSapdo6tqzyhjycvwzk9bszTtZ1ZgjaRfy2rrtDv3TpSsHF4c3q73-nA_Hzz6k-fhaNCf-AUDaHwoQMQKwkQCRkWeyzxUSVLyMJIswljlMlFBwCHkIhKlSFRBeSJVGOYoeEAp73rX-70ra77X6Jpsada2PcJlHIAxiIHtKLanCmucs6iyldVf0m4zoNkucPYXONsFzg6BW-lqL2lE_CcEMY0o57_Gxnbz</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Zhou, Wujie</creator><creator>Jian, Bitao</creator><creator>Fang, Meixin</creator><creator>Dong, Xiena</creator><creator>Liu, Yuanyuan</creator><creator>Jiang, Qiuping</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></search><sort><creationdate>202409</creationdate><title>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</title><author>Zhou, Wujie ; Jian, Bitao ; Fang, Meixin ; Dong, Xiena ; Liu, Yuanyuan ; Jiang, Qiuping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c211t-1c158f169a1e7cbbab6f99d367a27e8fba9f443163575d59fc039af66be534003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Availability</topic><topic>branch attention</topic><topic>Circuits and systems</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Decoding</topic><topic>depth guidance</topic><topic>Depth measurement</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>Indoor environment</topic><topic>indoor scene analysis</topic><topic>Knowledge discovery</topic><topic>knowledge distillation</topic><topic>Logic gates</topic><topic>Parameters</topic><topic>RGB-D data</topic><topic>Scene analysis</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Source code</topic><topic>Spatial data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Wujie</creatorcontrib><creatorcontrib>Jian, Bitao</creatorcontrib><creatorcontrib>Fang, Meixin</creatorcontrib><creatorcontrib>Dong, Xiena</creatorcontrib><creatorcontrib>Liu, Yuanyuan</creatorcontrib><creatorcontrib>Jiang, Qiuping</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Wujie</au><au>Jian, Bitao</au><au>Fang, Meixin</au><au>Dong, Xiena</au><au>Liu, Yuanyuan</au><au>Jiang, Qiuping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-09</date><risdate>2024</risdate><volume>34</volume><issue>9</issue><spage>7844</spage><epage>7855</epage><pages>7844-7855</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Significant advancements in RGB-D semantic segmentation have been made owing to the increasing availability of robust depth information. Most researchers have combined depth with RGB data to capture complementary information in images. Although this approach improves segmentation performance, it requires excessive model parameters. To address this problem, we propose DGPINet-KD, a deep-guided and progressive integration network with knowledge distillation (KD) for RGB-D indoor scene analysis. First, we used branching attention and depth guidance to capture coordinated, precise location information and extract more complete spatial information from the depth map to complement the semantic information for the encoded features. Second, we trained the student network (DGPINet-S) with a well-trained teacher network (DGPINet-T) using a multilevel KD. Third, an integration unit was developed to explore the contextual dependencies of the decoding features and to enhance relational KD. Comprehensive experiments on two challenging indoor benchmark datasets, NYUDv2 and SUN RGB-D, demonstrated that DGPINet-KD achieved improved performance in indoor scene analysis tasks compared with existing methods. Notably, on the NYUDv2 dataset, DGPINet-KD (DGPINet-S with KD) achieves a pixel accuracy gain of 1.7% and a class accuracy gain of 2.3% compared with DGPINet-S. In addition, compared with DGPINet-T, the proposed DGPINet-KD (DGPINet-S with KD) utilizes significantly fewer parameters (29.3M) while maintaining accuracy. The source code is available at https://github.com/XUEXIKUAIL/DGPINet .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3382354</doi><tpages>12</tpages><orcidid>https://orcid.org/0009-0006-1327-4416</orcidid><orcidid>https://orcid.org/0000-0002-3055-2493</orcidid><orcidid>https://orcid.org/0000-0002-6025-9343</orcidid><orcidid>https://orcid.org/0000-0003-0465-3976</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.7844-7855 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_10480703 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Availability branch attention Circuits and systems Computational modeling Datasets Decoding depth guidance Depth measurement Feature extraction Image segmentation Indoor environment indoor scene analysis Knowledge discovery knowledge distillation Logic gates Parameters RGB-D data Scene analysis Semantic segmentation Semantics Source code Spatial data |
title | DGPINet-KD: Deep Guided and Progressive Integration Network With Knowledge Distillation for RGB-D Indoor Scene Analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T03%3A37%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DGPINet-KD:%20Deep%20Guided%20and%20Progressive%20Integration%20Network%20With%20Knowledge%20Distillation%20for%20RGB-D%20Indoor%20Scene%20Analysis&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Zhou,%20Wujie&rft.date=2024-09&rft.volume=34&rft.issue=9&rft.spage=7844&rft.epage=7855&rft.pages=7844-7855&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3382354&rft_dat=%3Cproquest_RIE%3E3112218120%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3112218120&rft_id=info:pmid/&rft_ieee_id=10480703&rfr_iscdi=true |