FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement

The RGB-T tracking approach combines the advantages of visible and thermal sensors to achieve accurate target tracking in complex scenarios. However, previous RGB-T trackers based on the self-attention (SA) mechanism overlook crucial high-frequency information (such as texture, edges, and colors) th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2025-01, Vol.25 (2), p.3879-3897
Hauptverfasser: Gao, Zhao, Zhou, Dongming, Cao, Jinde, Liu, Yisong, Shan, Qingqing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3897
container_issue 2
container_start_page 3879
container_title IEEE sensors journal
container_volume 25
creator Gao, Zhao
Zhou, Dongming
Cao, Jinde
Liu, Yisong
Shan, Qingqing
description The RGB-T tracking approach combines the advantages of visible and thermal sensors to achieve accurate target tracking in complex scenarios. However, previous RGB-T trackers based on the self-attention (SA) mechanism overlook crucial high-frequency information (such as texture, edges, and colors) that is essential for object prediction. To address these challenges, we propose a frequency-enhanced dual-head focus tracking network (FDTrack) for RGB-T tracking. FDTrack comprises four main components: high-frequency feature enhancement (HFFE), wavelet multifrequency (WMF) interaction, autonomous modality prediction (AMP), and search focus preprocessing (SFP). HFFE refines the features from the ViT backbones within specific modalities by adaptively amplifying high-frequency features. In contrast, WMF facilitates communication between different frequency bands to enhance the interaction of RGB-T features from a frequency perspective. To improve tracking robustness under extreme scenes, AMP incorporates dual prediction heads and determines the final outcome through feature matching. SFP adjusts the convolution kernel size based on pixel-to-target distance and preprocesses the search region with Gaussian blur to reduce background clutter interference and emphasize the target. Extensive experimental results demonstrate that FDTrack achieves competitive performance compared to state-of-the-art algorithms across various datasets, including RGBT210, RGBT234, and LasHeR, showcasing its cutting-edge capabilities in this field.
doi_str_mv 10.1109/JSEN.2024.3506929
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3155820531</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10778233</ieee_id><sourcerecordid>3155820531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c911-8e6fdea6255122cf5a64d08842edaccd6501ff09c34475a13067cf8cc9730c213</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwAUgsLLFO8TO22VVtQ0FVWVAJdpY1sWn6SIqTqOrf09AuWM1Ic-_MnYPQPSUDSol5evuYzAeMMDHgkqSGmQvUo1LqhCqhL7uek0Rw9XWNbup6RQg1SqoeyrLxIjpYP-MhHrduk0y9y3FWQVvjv0FRfuO5b_ZVXOPPolniLPqf1pdwwJNy6UrwW182t-gquE3t7861jxbZZDGaJrP3l9fRcJaAoTTRPg25dymTkjIGQbpU5ERrwXzuAPJUEhoCMcCFUNJRTlIFQQMYxQkwyvvo8bR2F6tjiLqxq6qN5fGi5d23jEjeqehJBbGq6-iD3cVi6-LBUmI7WrajZTta9kzr6Hk4eQrv_T-9Uppxzn8BBfJkVQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3155820531</pqid></control><display><type>article</type><title>FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement</title><source>IEEE Electronic Library (IEL)</source><creator>Gao, Zhao ; Zhou, Dongming ; Cao, Jinde ; Liu, Yisong ; Shan, Qingqing</creator><creatorcontrib>Gao, Zhao ; Zhou, Dongming ; Cao, Jinde ; Liu, Yisong ; Shan, Qingqing</creatorcontrib><description>The RGB-T tracking approach combines the advantages of visible and thermal sensors to achieve accurate target tracking in complex scenarios. However, previous RGB-T trackers based on the self-attention (SA) mechanism overlook crucial high-frequency information (such as texture, edges, and colors) that is essential for object prediction. To address these challenges, we propose a frequency-enhanced dual-head focus tracking network (FDTrack) for RGB-T tracking. FDTrack comprises four main components: high-frequency feature enhancement (HFFE), wavelet multifrequency (WMF) interaction, autonomous modality prediction (AMP), and search focus preprocessing (SFP). HFFE refines the features from the ViT backbones within specific modalities by adaptively amplifying high-frequency features. In contrast, WMF facilitates communication between different frequency bands to enhance the interaction of RGB-T features from a frequency perspective. To improve tracking robustness under extreme scenes, AMP incorporates dual prediction heads and determines the final outcome through feature matching. SFP adjusts the convolution kernel size based on pixel-to-target distance and preprocesses the search region with Gaussian blur to reduce background clutter interference and emphasize the target. Extensive experimental results demonstrate that FDTrack achieves competitive performance compared to state-of-the-art algorithms across various datasets, including RGBT210, RGBT234, and LasHeR, showcasing its cutting-edge capabilities in this field.</description><identifier>ISSN: 1530-437X</identifier><identifier>EISSN: 1558-1748</identifier><identifier>DOI: 10.1109/JSEN.2024.3506929</identifier><identifier>CODEN: ISJEAZ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Algorithms ; Clutter ; Feature extraction ; Frequencies ; Frequency enhanced ; Head ; Image processing ; Ions ; Iron ; Magnetic heads ; multifrequency interaction ; Object tracking ; Real-time systems ; RGB-T tracking ; Target tracking ; Tracking ; Tracking networks ; Transformers ; visual focusing</subject><ispartof>IEEE sensors journal, 2025-01, Vol.25 (2), p.3879-3897</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c911-8e6fdea6255122cf5a64d08842edaccd6501ff09c34475a13067cf8cc9730c213</cites><orcidid>0009-0007-8294-9912 ; 0000-0001-9825-5914 ; 0009-0002-6810-7878 ; 0000-0003-0139-9415 ; 0000-0003-3133-7119</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10778233$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10778233$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gao, Zhao</creatorcontrib><creatorcontrib>Zhou, Dongming</creatorcontrib><creatorcontrib>Cao, Jinde</creatorcontrib><creatorcontrib>Liu, Yisong</creatorcontrib><creatorcontrib>Shan, Qingqing</creatorcontrib><title>FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement</title><title>IEEE sensors journal</title><addtitle>JSEN</addtitle><description>The RGB-T tracking approach combines the advantages of visible and thermal sensors to achieve accurate target tracking in complex scenarios. However, previous RGB-T trackers based on the self-attention (SA) mechanism overlook crucial high-frequency information (such as texture, edges, and colors) that is essential for object prediction. To address these challenges, we propose a frequency-enhanced dual-head focus tracking network (FDTrack) for RGB-T tracking. FDTrack comprises four main components: high-frequency feature enhancement (HFFE), wavelet multifrequency (WMF) interaction, autonomous modality prediction (AMP), and search focus preprocessing (SFP). HFFE refines the features from the ViT backbones within specific modalities by adaptively amplifying high-frequency features. In contrast, WMF facilitates communication between different frequency bands to enhance the interaction of RGB-T features from a frequency perspective. To improve tracking robustness under extreme scenes, AMP incorporates dual prediction heads and determines the final outcome through feature matching. SFP adjusts the convolution kernel size based on pixel-to-target distance and preprocesses the search region with Gaussian blur to reduce background clutter interference and emphasize the target. Extensive experimental results demonstrate that FDTrack achieves competitive performance compared to state-of-the-art algorithms across various datasets, including RGBT210, RGBT234, and LasHeR, showcasing its cutting-edge capabilities in this field.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Clutter</subject><subject>Feature extraction</subject><subject>Frequencies</subject><subject>Frequency enhanced</subject><subject>Head</subject><subject>Image processing</subject><subject>Ions</subject><subject>Iron</subject><subject>Magnetic heads</subject><subject>multifrequency interaction</subject><subject>Object tracking</subject><subject>Real-time systems</subject><subject>RGB-T tracking</subject><subject>Target tracking</subject><subject>Tracking</subject><subject>Tracking networks</subject><subject>Transformers</subject><subject>visual focusing</subject><issn>1530-437X</issn><issn>1558-1748</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwAUgsLLFO8TO22VVtQ0FVWVAJdpY1sWn6SIqTqOrf09AuWM1Ic-_MnYPQPSUDSol5evuYzAeMMDHgkqSGmQvUo1LqhCqhL7uek0Rw9XWNbup6RQg1SqoeyrLxIjpYP-MhHrduk0y9y3FWQVvjv0FRfuO5b_ZVXOPPolniLPqf1pdwwJNy6UrwW182t-gquE3t7861jxbZZDGaJrP3l9fRcJaAoTTRPg25dymTkjIGQbpU5ERrwXzuAPJUEhoCMcCFUNJRTlIFQQMYxQkwyvvo8bR2F6tjiLqxq6qN5fGi5d23jEjeqehJBbGq6-iD3cVi6-LBUmI7WrajZTta9kzr6Hk4eQrv_T-9Uppxzn8BBfJkVQ</recordid><startdate>20250115</startdate><enddate>20250115</enddate><creator>Gao, Zhao</creator><creator>Zhou, Dongming</creator><creator>Cao, Jinde</creator><creator>Liu, Yisong</creator><creator>Shan, Qingqing</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0007-8294-9912</orcidid><orcidid>https://orcid.org/0000-0001-9825-5914</orcidid><orcidid>https://orcid.org/0009-0002-6810-7878</orcidid><orcidid>https://orcid.org/0000-0003-0139-9415</orcidid><orcidid>https://orcid.org/0000-0003-3133-7119</orcidid></search><sort><creationdate>20250115</creationdate><title>FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement</title><author>Gao, Zhao ; Zhou, Dongming ; Cao, Jinde ; Liu, Yisong ; Shan, Qingqing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c911-8e6fdea6255122cf5a64d08842edaccd6501ff09c34475a13067cf8cc9730c213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Clutter</topic><topic>Feature extraction</topic><topic>Frequencies</topic><topic>Frequency enhanced</topic><topic>Head</topic><topic>Image processing</topic><topic>Ions</topic><topic>Iron</topic><topic>Magnetic heads</topic><topic>multifrequency interaction</topic><topic>Object tracking</topic><topic>Real-time systems</topic><topic>RGB-T tracking</topic><topic>Target tracking</topic><topic>Tracking</topic><topic>Tracking networks</topic><topic>Transformers</topic><topic>visual focusing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gao, Zhao</creatorcontrib><creatorcontrib>Zhou, Dongming</creatorcontrib><creatorcontrib>Cao, Jinde</creatorcontrib><creatorcontrib>Liu, Yisong</creatorcontrib><creatorcontrib>Shan, Qingqing</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE sensors journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Zhao</au><au>Zhou, Dongming</au><au>Cao, Jinde</au><au>Liu, Yisong</au><au>Shan, Qingqing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement</atitle><jtitle>IEEE sensors journal</jtitle><stitle>JSEN</stitle><date>2025-01-15</date><risdate>2025</risdate><volume>25</volume><issue>2</issue><spage>3879</spage><epage>3897</epage><pages>3879-3897</pages><issn>1530-437X</issn><eissn>1558-1748</eissn><coden>ISJEAZ</coden><abstract>The RGB-T tracking approach combines the advantages of visible and thermal sensors to achieve accurate target tracking in complex scenarios. However, previous RGB-T trackers based on the self-attention (SA) mechanism overlook crucial high-frequency information (such as texture, edges, and colors) that is essential for object prediction. To address these challenges, we propose a frequency-enhanced dual-head focus tracking network (FDTrack) for RGB-T tracking. FDTrack comprises four main components: high-frequency feature enhancement (HFFE), wavelet multifrequency (WMF) interaction, autonomous modality prediction (AMP), and search focus preprocessing (SFP). HFFE refines the features from the ViT backbones within specific modalities by adaptively amplifying high-frequency features. In contrast, WMF facilitates communication between different frequency bands to enhance the interaction of RGB-T features from a frequency perspective. To improve tracking robustness under extreme scenes, AMP incorporates dual prediction heads and determines the final outcome through feature matching. SFP adjusts the convolution kernel size based on pixel-to-target distance and preprocesses the search region with Gaussian blur to reduce background clutter interference and emphasize the target. Extensive experimental results demonstrate that FDTrack achieves competitive performance compared to state-of-the-art algorithms across various datasets, including RGBT210, RGBT234, and LasHeR, showcasing its cutting-edge capabilities in this field.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSEN.2024.3506929</doi><tpages>19</tpages><orcidid>https://orcid.org/0009-0007-8294-9912</orcidid><orcidid>https://orcid.org/0000-0001-9825-5914</orcidid><orcidid>https://orcid.org/0009-0002-6810-7878</orcidid><orcidid>https://orcid.org/0000-0003-0139-9415</orcidid><orcidid>https://orcid.org/0000-0003-3133-7119</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-437X
ispartof IEEE sensors journal, 2025-01, Vol.25 (2), p.3879-3897
issn 1530-437X
1558-1748
language eng
recordid cdi_proquest_journals_3155820531
source IEEE Electronic Library (IEL)
subjects Accuracy
Algorithms
Clutter
Feature extraction
Frequencies
Frequency enhanced
Head
Image processing
Ions
Iron
Magnetic heads
multifrequency interaction
Object tracking
Real-time systems
RGB-T tracking
Target tracking
Tracking
Tracking networks
Transformers
visual focusing
title FDTrack: A Dual-Head Focus Tracking Network With Frequency Enhancement
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T07%3A52%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FDTrack:%20A%20Dual-Head%20Focus%20Tracking%20Network%20With%20Frequency%20Enhancement&rft.jtitle=IEEE%20sensors%20journal&rft.au=Gao,%20Zhao&rft.date=2025-01-15&rft.volume=25&rft.issue=2&rft.spage=3879&rft.epage=3897&rft.pages=3879-3897&rft.issn=1530-437X&rft.eissn=1558-1748&rft.coden=ISJEAZ&rft_id=info:doi/10.1109/JSEN.2024.3506929&rft_dat=%3Cproquest_RIE%3E3155820531%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3155820531&rft_id=info:pmid/&rft_ieee_id=10778233&rfr_iscdi=true