MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification

A growing number of earth observation satellites are able to simultaneously gather multimodal images of the same area due to the expanding availability and resolution of satellite remote sensing data. This paper proposes a novel multimodal balanced self-learning interaction network (MBSI-Net) for th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-05, Vol.34 (5), p.3819-3833
Hauptverfasser:	Ma, Mengru, Ma, Wenping, Jiao, Licheng, Liu, Xu, Liu, Fang, Li, Lingling, Yang, Shuyuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Equalization Feature extraction Image classification Knowledge engineering Knowledge management Learning Modules multimodal Remote sensing Satellite observation Satellites Spatial resolution Teachers Texture Training transfer learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3833
container_issue	5
container_start_page	3819
container_title	IEEE transactions on circuits and systems for video technology
container_volume	34
creator	Ma, Mengru Ma, Wenping Jiao, Licheng Liu, Xu Liu, Fang Li, Lingling Yang, Shuyuan
description	A growing number of earth observation satellites are able to simultaneously gather multimodal images of the same area due to the expanding availability and resolution of satellite remote sensing data. This paper proposes a novel multimodal balanced self-learning interaction network (MBSI-Net) for the classification task. It involves a dual-branch teacher-student network that enables knowledge interaction and transfer between the multimodalities. Firstly, in order to introduce statistical information in addition to local and global structural information, a texture feature equalization module (TFE-Module) is proposed. This can enhance the texture information of features through histogram equalization and further improve the representation ability of features. Secondly, to enable the student network to provide timely feedback questions, the paper proposes a feature fusion module (F2-Module) that models and enhances teacher features through the student network. This helps to raise the classification's accuracy by incorporating information from multimodal images. Finally, the paper proposes a loss function based on structural similarity analysis to ensure balanced self-learning between the student and the teacher networks. Taking the multispectral (MS) and the panchromatic (PAN) images of the same scene as examples, through experimental verification, the proposed method can achieve good results on multiple datasets compared with other methods. Therefore, it offers an effective method for classifying and fusing multimodal data.
doi_str_mv	10.1109/TCSVT.2023.3322470
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3053298823</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10273708</ieee_id><sourcerecordid>3053298823</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-1e135f3dbbad7d3b0c94553996cbb12229d3cf964b745388a4160e07d61425eb3</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYU-2wnDhtEfERqYWhBbJbjXKqUNCl2KsS_J6UdmO6G93nv9BByydmEc5beLLL5-2ICDMRECACZsCMy4krpCICp42FnikcauDolZyGsGONSy2REPmb38zx6wf6WzrZNX6-70jb03ja2dVjSOTZVNEXr27pd0rzt0VvX111LB-S785-06jzN13aJNGtsCHVVO7sLnJOTyjYBLw5zTN4eHxbZczR9fcqzu2nkhi_7iCMXqhJlUdgyKUXBXCqVEmkau6LgAJCWwlVpLItEKqG1lTxmyJIy5hIUFmJMrve9G999bTH0ZtVtfTucNIIpAanWIIYU7FPOdyF4rMzG12vrfwxnZmfQ_Bk0O4PmYHCArvZQjYj_AEhEwrT4BUTYbA0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053298823</pqid></control><display><type>article</type><title>MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Ma, Mengru ; Ma, Wenping ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</creator><creatorcontrib>Ma, Mengru ; Ma, Wenping ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</creatorcontrib><description>A growing number of earth observation satellites are able to simultaneously gather multimodal images of the same area due to the expanding availability and resolution of satellite remote sensing data. This paper proposes a novel multimodal balanced self-learning interaction network (MBSI-Net) for the classification task. It involves a dual-branch teacher-student network that enables knowledge interaction and transfer between the multimodalities. Firstly, in order to introduce statistical information in addition to local and global structural information, a texture feature equalization module (TFE-Module) is proposed. This can enhance the texture information of features through histogram equalization and further improve the representation ability of features. Secondly, to enable the student network to provide timely feedback questions, the paper proposes a feature fusion module (F2-Module) that models and enhances teacher features through the student network. This helps to raise the classification's accuracy by incorporating information from multimodal images. Finally, the paper proposes a loss function based on structural similarity analysis to ensure balanced self-learning between the student and the teacher networks. Taking the multispectral (MS) and the panchromatic (PAN) images of the same scene as examples, through experimental verification, the proposed method can achieve good results on multiple datasets compared with other methods. Therefore, it offers an effective method for classifying and fusing multimodal data.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2023.3322470</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Equalization ; Feature extraction ; Image classification ; Knowledge engineering ; Knowledge management ; Learning ; Modules ; multimodal ; Remote sensing ; Satellite observation ; Satellites ; Spatial resolution ; Teachers ; Texture ; Training ; transfer learning</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-05, Vol.34 (5), p.3819-3833</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-1e135f3dbbad7d3b0c94553996cbb12229d3cf964b745388a4160e07d61425eb3</cites><orcidid>0000-0003-3354-9617 ; 0000-0002-6130-2518 ; 0000-0002-6802-539X ; 0000-0002-4796-5737 ; 0000-0002-5669-9354 ; 0000-0001-8872-2195 ; 0000-0002-8780-5455</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10273708$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10273708$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ma, Mengru</creatorcontrib><creatorcontrib>Ma, Wenping</creatorcontrib><creatorcontrib>Jiao, Licheng</creatorcontrib><creatorcontrib>Liu, Xu</creatorcontrib><creatorcontrib>Liu, Fang</creatorcontrib><creatorcontrib>Li, Lingling</creatorcontrib><creatorcontrib>Yang, Shuyuan</creatorcontrib><title>MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>A growing number of earth observation satellites are able to simultaneously gather multimodal images of the same area due to the expanding availability and resolution of satellite remote sensing data. This paper proposes a novel multimodal balanced self-learning interaction network (MBSI-Net) for the classification task. It involves a dual-branch teacher-student network that enables knowledge interaction and transfer between the multimodalities. Firstly, in order to introduce statistical information in addition to local and global structural information, a texture feature equalization module (TFE-Module) is proposed. This can enhance the texture information of features through histogram equalization and further improve the representation ability of features. Secondly, to enable the student network to provide timely feedback questions, the paper proposes a feature fusion module (F2-Module) that models and enhances teacher features through the student network. This helps to raise the classification's accuracy by incorporating information from multimodal images. Finally, the paper proposes a loss function based on structural similarity analysis to ensure balanced self-learning between the student and the teacher networks. Taking the multispectral (MS) and the panchromatic (PAN) images of the same scene as examples, through experimental verification, the proposed method can achieve good results on multiple datasets compared with other methods. Therefore, it offers an effective method for classifying and fusing multimodal data.</description><subject>Equalization</subject><subject>Feature extraction</subject><subject>Image classification</subject><subject>Knowledge engineering</subject><subject>Knowledge management</subject><subject>Learning</subject><subject>Modules</subject><subject>multimodal</subject><subject>Remote sensing</subject><subject>Satellite observation</subject><subject>Satellites</subject><subject>Spatial resolution</subject><subject>Teachers</subject><subject>Texture</subject><subject>Training</subject><subject>transfer learning</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYU-2wnDhtEfERqYWhBbJbjXKqUNCl2KsS_J6UdmO6G93nv9BByydmEc5beLLL5-2ICDMRECACZsCMy4krpCICp42FnikcauDolZyGsGONSy2REPmb38zx6wf6WzrZNX6-70jb03ja2dVjSOTZVNEXr27pd0rzt0VvX111LB-S785-06jzN13aJNGtsCHVVO7sLnJOTyjYBLw5zTN4eHxbZczR9fcqzu2nkhi_7iCMXqhJlUdgyKUXBXCqVEmkau6LgAJCWwlVpLItEKqG1lTxmyJIy5hIUFmJMrve9G999bTH0ZtVtfTucNIIpAanWIIYU7FPOdyF4rMzG12vrfwxnZmfQ_Bk0O4PmYHCArvZQjYj_AEhEwrT4BUTYbA0</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Ma, Mengru</creator><creator>Ma, Wenping</creator><creator>Jiao, Licheng</creator><creator>Liu, Xu</creator><creator>Liu, Fang</creator><creator>Li, Lingling</creator><creator>Yang, Shuyuan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3354-9617</orcidid><orcidid>https://orcid.org/0000-0002-6130-2518</orcidid><orcidid>https://orcid.org/0000-0002-6802-539X</orcidid><orcidid>https://orcid.org/0000-0002-4796-5737</orcidid><orcidid>https://orcid.org/0000-0002-5669-9354</orcidid><orcidid>https://orcid.org/0000-0001-8872-2195</orcidid><orcidid>https://orcid.org/0000-0002-8780-5455</orcidid></search><sort><creationdate>20240501</creationdate><title>MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification</title><author>Ma, Mengru ; Ma, Wenping ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-1e135f3dbbad7d3b0c94553996cbb12229d3cf964b745388a4160e07d61425eb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Equalization</topic><topic>Feature extraction</topic><topic>Image classification</topic><topic>Knowledge engineering</topic><topic>Knowledge management</topic><topic>Learning</topic><topic>Modules</topic><topic>multimodal</topic><topic>Remote sensing</topic><topic>Satellite observation</topic><topic>Satellites</topic><topic>Spatial resolution</topic><topic>Teachers</topic><topic>Texture</topic><topic>Training</topic><topic>transfer learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Mengru</creatorcontrib><creatorcontrib>Ma, Wenping</creatorcontrib><creatorcontrib>Jiao, Licheng</creatorcontrib><creatorcontrib>Liu, Xu</creatorcontrib><creatorcontrib>Liu, Fang</creatorcontrib><creatorcontrib>Li, Lingling</creatorcontrib><creatorcontrib>Yang, Shuyuan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, Mengru</au><au>Ma, Wenping</au><au>Jiao, Licheng</au><au>Liu, Xu</au><au>Liu, Fang</au><au>Li, Lingling</au><au>Yang, Shuyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>34</volume><issue>5</issue><spage>3819</spage><epage>3833</epage><pages>3819-3833</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>A growing number of earth observation satellites are able to simultaneously gather multimodal images of the same area due to the expanding availability and resolution of satellite remote sensing data. This paper proposes a novel multimodal balanced self-learning interaction network (MBSI-Net) for the classification task. It involves a dual-branch teacher-student network that enables knowledge interaction and transfer between the multimodalities. Firstly, in order to introduce statistical information in addition to local and global structural information, a texture feature equalization module (TFE-Module) is proposed. This can enhance the texture information of features through histogram equalization and further improve the representation ability of features. Secondly, to enable the student network to provide timely feedback questions, the paper proposes a feature fusion module (F2-Module) that models and enhances teacher features through the student network. This helps to raise the classification's accuracy by incorporating information from multimodal images. Finally, the paper proposes a loss function based on structural similarity analysis to ensure balanced self-learning between the student and the teacher networks. Taking the multispectral (MS) and the panchromatic (PAN) images of the same scene as examples, through experimental verification, the proposed method can achieve good results on multiple datasets compared with other methods. Therefore, it offers an effective method for classifying and fusing multimodal data.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2023.3322470</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-3354-9617</orcidid><orcidid>https://orcid.org/0000-0002-6130-2518</orcidid><orcidid>https://orcid.org/0000-0002-6802-539X</orcidid><orcidid>https://orcid.org/0000-0002-4796-5737</orcidid><orcidid>https://orcid.org/0000-0002-5669-9354</orcidid><orcidid>https://orcid.org/0000-0001-8872-2195</orcidid><orcidid>https://orcid.org/0000-0002-8780-5455</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2024-05, Vol.34 (5), p.3819-3833
issn	1051-8215 1558-2205
language	eng
recordid	cdi_proquest_journals_3053298823
source	IEEE Electronic Library (IEL)
subjects	Equalization Feature extraction Image classification Knowledge engineering Knowledge management Learning Modules multimodal Remote sensing Satellite observation Satellites Spatial resolution Teachers Texture Training transfer learning
title	MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T00%3A43%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MBSI-Net:%20Multimodal%20Balanced%20Self-Learning%20Interaction%20Network%20for%20Image%20Classification&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Ma,%20Mengru&rft.date=2024-05-01&rft.volume=34&rft.issue=5&rft.spage=3819&rft.epage=3833&rft.pages=3819-3833&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2023.3322470&rft_dat=%3Cproquest_RIE%3E3053298823%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3053298823&rft_id=info:pmid/&rft_ieee_id=10273708&rfr_iscdi=true