Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification

The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each mod...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-12
Hauptverfasser:	Wei, Shicai, Luo, Chunbo, Ma, Xiaoguang, Luo, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Convolutional neural networks decoupling learning deep learning Feature extraction Fuses Image classification Laser radar multimodal Optimization Probabilistic logic Remote sensing remote sensing (RS) Training Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	12
container_issue
container_start_page	1
container_title	IEEE transactions on geoscience and remote sensing
container_volume	62
creator	Wei, Shicai Luo, Chunbo Ma, Xiaoguang Luo, Yang
description	The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .
doi_str_mv	10.1109/TGRS.2024.3478393
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TGRS_2024_3478393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10714439</ieee_id><sourcerecordid>10_1109_TGRS_2024_3478393</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</originalsourceid><addsrcrecordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creator><creatorcontrib>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creatorcontrib><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2024.3478393</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>IEEE</publisher><subject>Classification ; Convolutional neural networks ; decoupling learning ; deep learning ; Feature extraction ; Fuses ; Image classification ; Laser radar ; multimodal ; Optimization ; Probabilistic logic ; Remote sensing ; remote sensing (RS) ; Training ; Transformers</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-4576-5934 ; 0000-0001-8848-4166 ; 0000-0002-9860-2901 ; 0000-0001-5744-2035</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><subject>Classification</subject><subject>Convolutional neural networks</subject><subject>decoupling learning</subject><subject>deep learning</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image classification</subject><subject>Laser radar</subject><subject>multimodal</subject><subject>Optimization</subject><subject>Probabilistic logic</subject><subject>Remote sensing</subject><subject>remote sensing (RS)</subject><subject>Training</subject><subject>Transformers</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wei, Shicai</creator><creator>Luo, Chunbo</creator><creator>Ma, Xiaoguang</creator><creator>Luo, Yang</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></search><sort><creationdate>2024</creationdate><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><author>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Convolutional neural networks</topic><topic>decoupling learning</topic><topic>deep learning</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image classification</topic><topic>Laser radar</topic><topic>multimodal</topic><topic>Optimization</topic><topic>Probabilistic logic</topic><topic>Remote sensing</topic><topic>remote sensing (RS)</topic><topic>Training</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Shicai</au><au>Luo, Chunbo</au><au>Ma, Xiaoguang</au><au>Luo, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2024</date><risdate>2024</risdate><volume>62</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</abstract><pub>IEEE</pub><doi>10.1109/TGRS.2024.3478393</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0196-2892
ispartof	IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12
issn	0196-2892 1558-0644
language	eng
recordid	cdi_crossref_primary_10_1109_TGRS_2024_3478393
source	IEEE Electronic Library (IEL)
subjects	Classification Convolutional neural networks decoupling learning deep learning Feature extraction Fuses Image classification Laser radar multimodal Optimization Probabilistic logic Remote sensing remote sensing (RS) Training Transformers
title	Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T17%3A53%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gradient%20Decoupled%20Learning%20With%20Unimodal%20Regularization%20for%20Multimodal%20Remote%20Sensing%20Classification&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Wei,%20Shicai&rft.date=2024&rft.volume=62&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2024.3478393&rft_dat=%3Ccrossref_RIE%3E10_1109_TGRS_2024_3478393%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10714439&rfr_iscdi=true