CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-re...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2024-05, Vol.132 (5), p.1748-1775
Hauptverfasser:	Liu, Jinyuan, Lin, Runjia, Wu, Guanyao, Liu, Risheng, Luo, Zhongxuan, Fan, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computed tomography Computer Imaging Computer Science Computer vision Degeneration Image Processing and Computer Vision Infrared imagery Learning Magnetic resonance imaging Medical imaging Pattern Recognition Pattern Recognition and Graphics Photon emission Positron emission Representations Tomography Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1775
container_issue	5
container_start_page	1748
container_title	International journal of computer vision
container_volume	132
creator	Liu, Jinyuan Lin, Runjia Wu, Guanyao Liu, Risheng Luo, Zhongxuan Fan, Xin
description	Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target/background detail part is pulled close to the infrared/visible source and pushed far away from the visible/infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.
doi_str_mv	10.1007/s11263-023-01952-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3051754345</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3051754345</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-abac0c21e11dcb27c7cc8888d0c1286e98a14957fad1afcf32284491f0db05403</originalsourceid><addsrcrecordid>eNp9kMFOwzAQRC0EEqXwA5wscQ7s2nGTcENVC0gFLnC2XGdTAmlcbKeIv8fQStzY1Woub2alYewc4RIBiquAKCYyA5EOKyUyPGAjVIXMMAd1yEZQCcjUpMJjdhLCGwCIUsgRc9O0jxSv-dQNm47qpH30JsR2S3xBxvdtv-KJ-HT-nX-28ZU_DF1ss4621PE5mTh44rM-0HrZEW-c3wNrV5uujV_8fm1WxOdDaF1_yo4a0wU62-uYvcxnz9O7bPF0ez-9WWRWYhUzszQWrEBCrO1SFLawtkxTg0VRTqgqDeaVKhpTo2lsI4Uo87zCBuolqBzkmF3scjfefQwUon5zg-_TSy1BYaFymatEiR1lvQvBU6M3vl0b_6UR9E-xelesTsXq32I1JpPcmUKC-xX5v-h_XN-8-3wq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3051754345</pqid></control><display><type>article</type><title>CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion</title><source>Springer Nature - Complete Springer Journals</source><creator>Liu, Jinyuan ; Lin, Runjia ; Wu, Guanyao ; Liu, Risheng ; Luo, Zhongxuan ; Fan, Xin</creator><creatorcontrib>Liu, Jinyuan ; Lin, Runjia ; Wu, Guanyao ; Liu, Risheng ; Luo, Zhongxuan ; Fan, Xin</creatorcontrib><description>Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target/background detail part is pulled close to the infrared/visible source and pushed far away from the visible/infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-023-01952-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Computed tomography ; Computer Imaging ; Computer Science ; Computer vision ; Degeneration ; Image Processing and Computer Vision ; Infrared imagery ; Learning ; Magnetic resonance imaging ; Medical imaging ; Pattern Recognition ; Pattern Recognition and Graphics ; Photon emission ; Positron emission ; Representations ; Tomography ; Vision</subject><ispartof>International journal of computer vision, 2024-05, Vol.132 (5), p.1748-1775</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-abac0c21e11dcb27c7cc8888d0c1286e98a14957fad1afcf32284491f0db05403</citedby><cites>FETCH-LOGICAL-c319t-abac0c21e11dcb27c7cc8888d0c1286e98a14957fad1afcf32284491f0db05403</cites><orcidid>0000-0003-2085-2676</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-023-01952-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-023-01952-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Liu, Jinyuan</creatorcontrib><creatorcontrib>Lin, Runjia</creatorcontrib><creatorcontrib>Wu, Guanyao</creatorcontrib><creatorcontrib>Liu, Risheng</creatorcontrib><creatorcontrib>Luo, Zhongxuan</creatorcontrib><creatorcontrib>Fan, Xin</creatorcontrib><title>CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target/background detail part is pulled close to the infrared/visible source and pushed far away from the visible/infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.</description><subject>Artificial Intelligence</subject><subject>Computed tomography</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Degeneration</subject><subject>Image Processing and Computer Vision</subject><subject>Infrared imagery</subject><subject>Learning</subject><subject>Magnetic resonance imaging</subject><subject>Medical imaging</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Photon emission</subject><subject>Positron emission</subject><subject>Representations</subject><subject>Tomography</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMFOwzAQRC0EEqXwA5wscQ7s2nGTcENVC0gFLnC2XGdTAmlcbKeIv8fQStzY1Woub2alYewc4RIBiquAKCYyA5EOKyUyPGAjVIXMMAd1yEZQCcjUpMJjdhLCGwCIUsgRc9O0jxSv-dQNm47qpH30JsR2S3xBxvdtv-KJ-HT-nX-28ZU_DF1ss4621PE5mTh44rM-0HrZEW-c3wNrV5uujV_8fm1WxOdDaF1_yo4a0wU62-uYvcxnz9O7bPF0ez-9WWRWYhUzszQWrEBCrO1SFLawtkxTg0VRTqgqDeaVKhpTo2lsI4Uo87zCBuolqBzkmF3scjfefQwUon5zg-_TSy1BYaFymatEiR1lvQvBU6M3vl0b_6UR9E-xelesTsXq32I1JpPcmUKC-xX5v-h_XN-8-3wq</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Liu, Jinyuan</creator><creator>Lin, Runjia</creator><creator>Wu, Guanyao</creator><creator>Liu, Risheng</creator><creator>Luo, Zhongxuan</creator><creator>Fan, Xin</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2085-2676</orcidid></search><sort><creationdate>20240501</creationdate><title>CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion</title><author>Liu, Jinyuan ; Lin, Runjia ; Wu, Guanyao ; Liu, Risheng ; Luo, Zhongxuan ; Fan, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-abac0c21e11dcb27c7cc8888d0c1286e98a14957fad1afcf32284491f0db05403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Computed tomography</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Degeneration</topic><topic>Image Processing and Computer Vision</topic><topic>Infrared imagery</topic><topic>Learning</topic><topic>Magnetic resonance imaging</topic><topic>Medical imaging</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Photon emission</topic><topic>Positron emission</topic><topic>Representations</topic><topic>Tomography</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Jinyuan</creatorcontrib><creatorcontrib>Lin, Runjia</creatorcontrib><creatorcontrib>Wu, Guanyao</creatorcontrib><creatorcontrib>Liu, Risheng</creatorcontrib><creatorcontrib>Luo, Zhongxuan</creatorcontrib><creatorcontrib>Fan, Xin</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Jinyuan</au><au>Lin, Runjia</au><au>Wu, Guanyao</au><au>Liu, Risheng</au><au>Luo, Zhongxuan</au><au>Fan, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>132</volume><issue>5</issue><spage>1748</spage><epage>1775</epage><pages>1748-1775</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target/background detail part is pulled close to the infrared/visible source and pushed far away from the visible/infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-023-01952-1</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-2085-2676</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-5691
ispartof	International journal of computer vision, 2024-05, Vol.132 (5), p.1748-1775
issn	0920-5691 1573-1405
language	eng
recordid	cdi_proquest_journals_3051754345
source	Springer Nature - Complete Springer Journals
subjects	Artificial Intelligence Computed tomography Computer Imaging Computer Science Computer vision Degeneration Image Processing and Computer Vision Infrared imagery Learning Magnetic resonance imaging Medical imaging Pattern Recognition Pattern Recognition and Graphics Photon emission Positron emission Representations Tomography Vision
title	CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T07%3A45%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoCoNet:%20Coupled%20Contrastive%20Learning%20Network%20with%20Multi-level%20Feature%20Ensemble%20for%20Multi-modality%20Image%20Fusion&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Liu,%20Jinyuan&rft.date=2024-05-01&rft.volume=132&rft.issue=5&rft.spage=1748&rft.epage=1775&rft.pages=1748-1775&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-023-01952-1&rft_dat=%3Cproquest_cross%3E3051754345%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3051754345&rft_id=info:pmid/&rfr_iscdi=true