Learning reliable modal weight with transformer for robust RGBT tracking

Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a str...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-08, Vol.249, p.108945, Article 108945
Hauptverfasser: Feng, Mingzheng, Su, Jianbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 108945
container_title Knowledge-based systems
container_volume 249
creator Feng, Mingzheng
Su, Jianbo
description Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.
doi_str_mv 10.1016/j.knosys.2022.108945
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2687833712</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122004579</els_id><sourcerecordid>2687833712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2687833712</pqid></control><display><type>article</type><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><source>Elsevier ScienceDirect Journals</source><creator>Feng, Mingzheng ; Su, Jianbo</creator><creatorcontrib>Feng, Mingzheng ; Su, Jianbo</creatorcontrib><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108945</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Feature extraction ; RGBT tracking ; Robustness ; Semantic features ; Semantics ; Tracking ; Transformer ; Transformers</subject><ispartof>Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Aug 5, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</citedby><cites>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</cites><orcidid>0000-0001-6931-5842</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122004579$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><title>Knowledge-based systems</title><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><subject>Feature extraction</subject><subject>RGBT tracking</subject><subject>Robustness</subject><subject>Semantic features</subject><subject>Semantics</subject><subject>Tracking</subject><subject>Transformer</subject><subject>Transformers</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</recordid><startdate>20220805</startdate><enddate>20220805</enddate><creator>Feng, Mingzheng</creator><creator>Su, Jianbo</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></search><sort><creationdate>20220805</creationdate><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><author>Feng, Mingzheng ; Su, Jianbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Feature extraction</topic><topic>RGBT tracking</topic><topic>Robustness</topic><topic>Semantic features</topic><topic>Semantics</topic><topic>Tracking</topic><topic>Transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Mingzheng</au><au>Su, Jianbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning reliable modal weight with transformer for robust RGBT tracking</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-08-05</date><risdate>2022</risdate><volume>249</volume><spage>108945</spage><pages>108945-</pages><artnum>108945</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108945</doi><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0950-7051
ispartof Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945
issn 0950-7051
1872-7409
language eng
recordid cdi_proquest_journals_2687833712
source Elsevier ScienceDirect Journals
subjects Feature extraction
RGBT tracking
Robustness
Semantic features
Semantics
Tracking
Transformer
Transformers
title Learning reliable modal weight with transformer for robust RGBT tracking
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A57%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20reliable%20modal%20weight%20with%20transformer%20for%20robust%20RGBT%20tracking&rft.jtitle=Knowledge-based%20systems&rft.au=Feng,%20Mingzheng&rft.date=2022-08-05&rft.volume=249&rft.spage=108945&rft.pages=108945-&rft.artnum=108945&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108945&rft_dat=%3Cproquest_cross%3E2687833712%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2687833712&rft_id=info:pmid/&rft_els_id=S0950705122004579&rfr_iscdi=true