Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer

Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local recep...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.78956-78969
Hauptverfasser: Wang, Hongmei, Li, Lin, Li, Chenkai, Lu, Xuanyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 78969
container_issue
container_start_page 78956
container_title IEEE access
container_volume 11
creator Wang, Hongmei
Li, Lin
Li, Chenkai
Lu, Xuanyu
description Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.
doi_str_mv 10.1109/ACCESS.2023.3298437
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2844895957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10192407</ieee_id><doaj_id>oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e</doaj_id><sourcerecordid>2844895957</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</originalsourceid><addsrcrecordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844895957</pqid></control><display><type>article</type><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creator><creatorcontrib>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creatorcontrib><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3298437</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Coders ; Computer vision ; convolutional neural network ; Convolutional neural networks ; Feature extraction ; Generators ; Image acquisition ; Image contrast ; Image enhancement ; Image fusion ; infrared image ; Infrared imagery ; Infrared imaging ; Modules ; Target detection ; Task analysis ; Training ; transformer ; Transformers ; visible image ; Visualization</subject><ispartof>IEEE access, 2023, Vol.11, p.78956-78969</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</citedby><cites>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</cites><orcidid>0000-0001-6074-0199</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10192407$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><title>IEEE access</title><addtitle>Access</addtitle><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><subject>Coders</subject><subject>Computer vision</subject><subject>convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>Generators</subject><subject>Image acquisition</subject><subject>Image contrast</subject><subject>Image enhancement</subject><subject>Image fusion</subject><subject>infrared image</subject><subject>Infrared imagery</subject><subject>Infrared imaging</subject><subject>Modules</subject><subject>Target detection</subject><subject>Task analysis</subject><subject>Training</subject><subject>transformer</subject><subject>Transformers</subject><subject>visible image</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Wang, Hongmei</creator><creator>Li, Lin</creator><creator>Li, Chenkai</creator><creator>Lu, Xuanyu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid></search><sort><creationdate>2023</creationdate><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><author>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Computer vision</topic><topic>convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>Generators</topic><topic>Image acquisition</topic><topic>Image contrast</topic><topic>Image enhancement</topic><topic>Image fusion</topic><topic>infrared image</topic><topic>Infrared imagery</topic><topic>Infrared imaging</topic><topic>Modules</topic><topic>Target detection</topic><topic>Task analysis</topic><topic>Training</topic><topic>transformer</topic><topic>Transformers</topic><topic>visible image</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Hongmei</au><au>Li, Lin</au><au>Li, Chenkai</au><au>Lu, Xuanyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>78956</spage><epage>78969</epage><pages>78956-78969</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3298437</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023, Vol.11, p.78956-78969
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2844895957
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Coders
Computer vision
convolutional neural network
Convolutional neural networks
Feature extraction
Generators
Image acquisition
Image contrast
Image enhancement
Image fusion
infrared image
Infrared imagery
Infrared imaging
Modules
Target detection
Task analysis
Training
transformer
Transformers
visible image
Visualization
title Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T23%3A30%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Infrared%20and%20Visible%20Image%20Fusion%20Based%20on%20Autoencoder%20Composed%20of%20CNN-Transformer&rft.jtitle=IEEE%20access&rft.au=Wang,%20Hongmei&rft.date=2023&rft.volume=11&rft.spage=78956&rft.epage=78969&rft.pages=78956-78969&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3298437&rft_dat=%3Cproquest_doaj_%3E2844895957%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2844895957&rft_id=info:pmid/&rft_ieee_id=10192407&rft_doaj_id=oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e&rfr_iscdi=true