Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification

The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each mod...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-12
Hauptverfasser: Wei, Shicai, Luo, Chunbo, Ma, Xiaoguang, Luo, Yang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue
container_start_page 1
container_title IEEE transactions on geoscience and remote sensing
container_volume 62
creator Wei, Shicai
Luo, Chunbo
Ma, Xiaoguang
Luo, Yang
description The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .
doi_str_mv 10.1109/TGRS.2024.3478393
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TGRS_2024_3478393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10714439</ieee_id><sourcerecordid>10_1109_TGRS_2024_3478393</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</originalsourceid><addsrcrecordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creator><creatorcontrib>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creatorcontrib><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2024.3478393</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>IEEE</publisher><subject>Classification ; Convolutional neural networks ; decoupling learning ; deep learning ; Feature extraction ; Fuses ; Image classification ; Laser radar ; multimodal ; Optimization ; Probabilistic logic ; Remote sensing ; remote sensing (RS) ; Training ; Transformers</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-4576-5934 ; 0000-0001-8848-4166 ; 0000-0002-9860-2901 ; 0000-0001-5744-2035</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><subject>Classification</subject><subject>Convolutional neural networks</subject><subject>decoupling learning</subject><subject>deep learning</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image classification</subject><subject>Laser radar</subject><subject>multimodal</subject><subject>Optimization</subject><subject>Probabilistic logic</subject><subject>Remote sensing</subject><subject>remote sensing (RS)</subject><subject>Training</subject><subject>Transformers</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wei, Shicai</creator><creator>Luo, Chunbo</creator><creator>Ma, Xiaoguang</creator><creator>Luo, Yang</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></search><sort><creationdate>2024</creationdate><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><author>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Convolutional neural networks</topic><topic>decoupling learning</topic><topic>deep learning</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image classification</topic><topic>Laser radar</topic><topic>multimodal</topic><topic>Optimization</topic><topic>Probabilistic logic</topic><topic>Remote sensing</topic><topic>remote sensing (RS)</topic><topic>Training</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Shicai</au><au>Luo, Chunbo</au><au>Ma, Xiaoguang</au><au>Luo, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2024</date><risdate>2024</risdate><volume>62</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</abstract><pub>IEEE</pub><doi>10.1109/TGRS.2024.3478393</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0196-2892
ispartof IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12
issn 0196-2892
1558-0644
language eng
recordid cdi_crossref_primary_10_1109_TGRS_2024_3478393
source IEEE Electronic Library (IEL)
subjects Classification
Convolutional neural networks
decoupling learning
deep learning
Feature extraction
Fuses
Image classification
Laser radar
multimodal
Optimization
Probabilistic logic
Remote sensing
remote sensing (RS)
Training
Transformers
title Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T17%3A53%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gradient%20Decoupled%20Learning%20With%20Unimodal%20Regularization%20for%20Multimodal%20Remote%20Sensing%20Classification&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Wei,%20Shicai&rft.date=2024&rft.volume=62&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2024.3478393&rft_dat=%3Ccrossref_RIE%3E10_1109_TGRS_2024_3478393%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10714439&rfr_iscdi=true