Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer
Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local recep...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023, Vol.11, p.78956-78969 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 78969 |
---|---|
container_issue | |
container_start_page | 78956 |
container_title | IEEE access |
container_volume | 11 |
creator | Wang, Hongmei Li, Lin Li, Chenkai Lu, Xuanyu |
description | Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks. |
doi_str_mv | 10.1109/ACCESS.2023.3298437 |
format | Article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2844895957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10192407</ieee_id><doaj_id>oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e</doaj_id><sourcerecordid>2844895957</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</originalsourceid><addsrcrecordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844895957</pqid></control><display><type>article</type><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creator><creatorcontrib>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</creatorcontrib><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3298437</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Coders ; Computer vision ; convolutional neural network ; Convolutional neural networks ; Feature extraction ; Generators ; Image acquisition ; Image contrast ; Image enhancement ; Image fusion ; infrared image ; Infrared imagery ; Infrared imaging ; Modules ; Target detection ; Task analysis ; Training ; transformer ; Transformers ; visible image ; Visualization</subject><ispartof>IEEE access, 2023, Vol.11, p.78956-78969</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</citedby><cites>FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</cites><orcidid>0000-0001-6074-0199</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10192407$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><title>IEEE access</title><addtitle>Access</addtitle><description>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</description><subject>Coders</subject><subject>Computer vision</subject><subject>convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>Generators</subject><subject>Image acquisition</subject><subject>Image contrast</subject><subject>Image enhancement</subject><subject>Image fusion</subject><subject>infrared image</subject><subject>Infrared imagery</subject><subject>Infrared imaging</subject><subject>Modules</subject><subject>Target detection</subject><subject>Task analysis</subject><subject>Training</subject><subject>transformer</subject><subject>Transformers</subject><subject>visible image</subject><subject>Visualization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LBDEMHURBUX-BHgY8z9rvTo_r4MeCKPh1LZk2lVl2p2u7e_DfWx0Rc0l4yXsJeVV1RsmMUmIu5113_fw8Y4TxGWemFVzvVUeMKtNwydX-v_qwOs15SUq0BZL6qHpajCFBQl_D6Ou3IQ_9CuvFGt6xvtnlIY71FeTSLsV8t404uugx1V1cb-IPHuru4aF5STDmENMa00l1EGCV8fQ3H1evN9cv3V1z_3i76Ob3jRPEbBvUxigedBAeZAuOECWd0gKVZsyTXgnptTZ9ACe4aDkNxjmK1HvXg_QtP64Wk66PsLSbNKwhfdoIg_0BYnq3kLaDW6GVyIInznBwIBRIUCS4nmiURY8bLFoXk9YmxY8d5q1dxl0ay_mWtUK0RpZvlSk-TbkUc04Y_rZSYr-9sJMX9tsL--tFYZ1PrAER_zGoYYJo_gWcc4UV</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Wang, Hongmei</creator><creator>Li, Lin</creator><creator>Li, Chenkai</creator><creator>Lu, Xuanyu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid></search><sort><creationdate>2023</creationdate><title>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</title><author>Wang, Hongmei ; Li, Lin ; Li, Chenkai ; Lu, Xuanyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-e79963f7f4da58ac0065c674e6722d0b645d779bfac434831f9cc1e1ddcba5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Computer vision</topic><topic>convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>Generators</topic><topic>Image acquisition</topic><topic>Image contrast</topic><topic>Image enhancement</topic><topic>Image fusion</topic><topic>infrared image</topic><topic>Infrared imagery</topic><topic>Infrared imaging</topic><topic>Modules</topic><topic>Target detection</topic><topic>Task analysis</topic><topic>Training</topic><topic>transformer</topic><topic>Transformers</topic><topic>visible image</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hongmei</creatorcontrib><creatorcontrib>Li, Lin</creatorcontrib><creatorcontrib>Li, Chenkai</creatorcontrib><creatorcontrib>Lu, Xuanyu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Hongmei</au><au>Li, Lin</au><au>Li, Chenkai</au><au>Lu, Xuanyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>78956</spage><epage>78969</epage><pages>78956-78969</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3298437</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6074-0199</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2023, Vol.11, p.78956-78969 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2844895957 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Coders Computer vision convolutional neural network Convolutional neural networks Feature extraction Generators Image acquisition Image contrast Image enhancement Image fusion infrared image Infrared imagery Infrared imaging Modules Target detection Task analysis Training transformer Transformers visible image Visualization |
title | Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T23%3A30%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Infrared%20and%20Visible%20Image%20Fusion%20Based%20on%20Autoencoder%20Composed%20of%20CNN-Transformer&rft.jtitle=IEEE%20access&rft.au=Wang,%20Hongmei&rft.date=2023&rft.volume=11&rft.spage=78956&rft.epage=78969&rft.pages=78956-78969&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3298437&rft_dat=%3Cproquest_doaj_%3E2844895957%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2844895957&rft_id=info:pmid/&rft_ieee_id=10192407&rft_doaj_id=oai_doaj_org_article_5e2fd0c93aca46a5a60fcb07e5e1d39e&rfr_iscdi=true |