Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer

Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local recep...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023, Vol.11, p.78956-78969
Hauptverfasser:	Wang, Hongmei, Li, Lin, Li, Chenkai, Lu, Xuanyu
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Computer vision convolutional neural network Convolutional neural networks Feature extraction Generators Image acquisition Image contrast Image enhancement Image fusion infrared image Infrared imagery Infrared imaging Modules Target detection Task analysis Training transformer Transformers visible image Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	78969
container_issue
container_start_page	78956
container_title	IEEE access
container_volume	11
creator	Wang, Hongmei Li, Lin Li, Chenkai Lu, Xuanyu
description	Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.
doi_str_mv	10.1109/ACCESS.2023.3298437
format	Article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2844895957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10192407</ieee_id><doaj_id>oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e</doaj_id><sourcerecordid>2844895957</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</originalsourceid><addsrcrecordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844895957</pqid></control><display><type>article</type><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creator><creatorcontrib>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creatorcontrib><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3298437</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Coders ; Computer vision ; convolutional neural network ; Convolutional neural networks ; Feature extraction ; Generators ; Image acquisition ; Image contrast ; Image enhancement ; Image fusion ; infrared image ; Infrared imagery ; Infrared imaging ; Modules ; Target detection ; Task analysis ; Training ; transformer ; Transformers ; visible image ; Visualization</subject><ispartof>IEEE access, 2023, Vol.11, p.78956-78969</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</citedby><cites>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</cites><orcidid>0000-0001-6074-0199</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10192407$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><title>IEEE access</title><addtitle>Access</addtitle><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><subject>Coders</subject><subject>Computer vision</subject><subject>convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>Generators</subject><subject>Image acquisition</subject><subject>Image contrast</subject><subject>Image enhancement</subject><subject>Image fusion</subject><subject>infrared image</subject><subject>Infrared imagery</subject><subject>Infrared imaging</subject><subject>Modules</subject><subject>Target detection</subject><subject>Task analysis</subject><subject>Training</subject><subject>transformer</subject><subject>Transformers</subject><subject>visible image</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Wang, Hongmei</creator><creator>Li, Lin</creator><creator>Li, Chenkai</creator><creator>Lu, Xuanyu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid></search><sort><creationdate>2023</creationdate><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><author>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Computer vision</topic><topic>convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>Generators</topic><topic>Image acquisition</topic><topic>Image contrast</topic><topic>Image enhancement</topic><topic>Image fusion</topic><topic>infrared image</topic><topic>Infrared imagery</topic><topic>Infrared imaging</topic><topic>Modules</topic><topic>Target detection</topic><topic>Task analysis</topic><topic>Training</topic><topic>transformer</topic><topic>Transformers</topic><topic>visible image</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Hongmei</au><au>Li, Lin</au><au>Li, Chenkai</au><au>Lu, Xuanyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>78956</spage><epage>78969</epage><pages>78956-78969</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3298437</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2023, Vol.11, p.78956-78969
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2844895957
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Coders Computer vision convolutional neural network Convolutional neural networks Feature extraction Generators Image acquisition Image contrast Image enhancement Image fusion infrared image Infrared imagery Infrared imaging Modules Target detection Task analysis Training transformer Transformers visible image Visualization
title	Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T23%3A30%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Infrared%20and%20Visible%20Image%20Fusion%20Based%20on%20Autoencoder%20Composed%20of%20CNN-Transformer&rft.jtitle=IEEE%20access&rft.au=Wang,%20Hongmei&rft.date=2023&rft.volume=11&rft.spage=78956&rft.epage=78969&rft.pages=78956-78969&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3298437&rft_dat=%3Cproquest_doaj_%3E2844895957%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2844895957&rft_id=info:pmid/&rft_ieee_id=10192407&rft_doaj_id=oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e&rfr_iscdi=true