Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification
The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each mod...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-12 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on geoscience and remote sensing |
container_volume | 62 |
creator | Wei, Shicai Luo, Chunbo Ma, Xiaoguang Luo, Yang |
description | The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet . |
doi_str_mv | 10.1109/TGRS.2024.3478393 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TGRS_2024_3478393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10714439</ieee_id><sourcerecordid>10_1109_TGRS_2024_3478393</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</originalsourceid><addsrcrecordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creator><creatorcontrib>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</creatorcontrib><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2024.3478393</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>IEEE</publisher><subject>Classification ; Convolutional neural networks ; decoupling learning ; deep learning ; Feature extraction ; Fuses ; Image classification ; Laser radar ; multimodal ; Optimization ; Probabilistic logic ; Remote sensing ; remote sensing (RS) ; Training ; Transformers</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-4576-5934 ; 0000-0001-8848-4166 ; 0000-0002-9860-2901 ; 0000-0001-5744-2035</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10714439$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</description><subject>Classification</subject><subject>Convolutional neural networks</subject><subject>decoupling learning</subject><subject>deep learning</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image classification</subject><subject>Laser radar</subject><subject>multimodal</subject><subject>Optimization</subject><subject>Probabilistic logic</subject><subject>Remote sensing</subject><subject>remote sensing (RS)</subject><subject>Training</subject><subject>Transformers</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1KxDAUhYMoWEcfQHCRF2jNbdI0WcqoVagI84PLEtubMdJph6Rd6NM7dQZxdRbnfGfxEXINLAFg-nZVLJZJylKRcJErrvkJiSDLVMykEKckYqBlnCqdnpOLED4ZA5FBHhEsvGkcdgO9x7ofdy02tETjO9dt6JsbPui6c9u-MS1d4GZsjXffZnB9R23v6cvYDn_tth-QLrELEzpvTQjOuvp3fEnOrGkDXh1zRtaPD6v5U1y-Fs_zuzKuQaghltxIyK1lSqF5R5NbrVHohkGqM5C1FDLLJEoE3SjNFVdKSWGwlgw5aMFnBA6_te9D8GirnXdb478qYNXkqZo8VZOn6uhpz9wcGIeI__Y5CLHvfwBUHWW6</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wei, Shicai</creator><creator>Luo, Chunbo</creator><creator>Ma, Xiaoguang</creator><creator>Luo, Yang</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></search><sort><creationdate>2024</creationdate><title>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</title><author>Wei, Shicai ; Luo, Chunbo ; Ma, Xiaoguang ; Luo, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-63a617ff088eabea7f99e49d0129516c646556e6e19d8938388864aec60e31943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Convolutional neural networks</topic><topic>decoupling learning</topic><topic>deep learning</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image classification</topic><topic>Laser radar</topic><topic>multimodal</topic><topic>Optimization</topic><topic>Probabilistic logic</topic><topic>Remote sensing</topic><topic>remote sensing (RS)</topic><topic>Training</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wei, Shicai</creatorcontrib><creatorcontrib>Luo, Chunbo</creatorcontrib><creatorcontrib>Ma, Xiaoguang</creatorcontrib><creatorcontrib>Luo, Yang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Shicai</au><au>Luo, Chunbo</au><au>Ma, Xiaoguang</au><au>Luo, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2024</date><risdate>2024</risdate><volume>62</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .</abstract><pub>IEEE</pub><doi>10.1109/TGRS.2024.3478393</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-4576-5934</orcidid><orcidid>https://orcid.org/0000-0001-8848-4166</orcidid><orcidid>https://orcid.org/0000-0002-9860-2901</orcidid><orcidid>https://orcid.org/0000-0001-5744-2035</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0196-2892 |
ispartof | IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-12 |
issn | 0196-2892 1558-0644 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TGRS_2024_3478393 |
source | IEEE Electronic Library (IEL) |
subjects | Classification Convolutional neural networks decoupling learning deep learning Feature extraction Fuses Image classification Laser radar multimodal Optimization Probabilistic logic Remote sensing remote sensing (RS) Training Transformers |
title | Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T17%3A53%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gradient%20Decoupled%20Learning%20With%20Unimodal%20Regularization%20for%20Multimodal%20Remote%20Sensing%20Classification&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Wei,%20Shicai&rft.date=2024&rft.volume=62&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2024.3478393&rft_dat=%3Ccrossref_RIE%3E10_1109_TGRS_2024_3478393%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10714439&rfr_iscdi=true |