Learning reliable modal weight with transformer for robust RGBT tracking

Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a str...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-08, Vol.249, p.108945, Article 108945
Hauptverfasser:	Feng, Mingzheng, Su, Jianbo
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature extraction RGBT tracking Robustness Semantic features Semantics Tracking Transformer Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	108945
container_title	Knowledge-based systems
container_volume	249
creator	Feng, Mingzheng Su, Jianbo
description	Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.
doi_str_mv	10.1016/j.knosys.2022.108945
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2687833712</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122004579</els_id><sourcerecordid>2687833712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2687833712</pqid></control><display><type>article</type><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><source>Elsevier ScienceDirect Journals</source><creator>Feng, Mingzheng ; Su, Jianbo</creator><creatorcontrib>Feng, Mingzheng ; Su, Jianbo</creatorcontrib><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108945</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Feature extraction ; RGBT tracking ; Robustness ; Semantic features ; Semantics ; Tracking ; Transformer ; Transformers</subject><ispartof>Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Aug 5, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</citedby><cites>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</cites><orcidid>0000-0001-6931-5842</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122004579$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><title>Knowledge-based systems</title><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><subject>Feature extraction</subject><subject>RGBT tracking</subject><subject>Robustness</subject><subject>Semantic features</subject><subject>Semantics</subject><subject>Tracking</subject><subject>Transformer</subject><subject>Transformers</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</recordid><startdate>20220805</startdate><enddate>20220805</enddate><creator>Feng, Mingzheng</creator><creator>Su, Jianbo</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></search><sort><creationdate>20220805</creationdate><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><author>Feng, Mingzheng ; Su, Jianbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Feature extraction</topic><topic>RGBT tracking</topic><topic>Robustness</topic><topic>Semantic features</topic><topic>Semantics</topic><topic>Tracking</topic><topic>Transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Mingzheng</au><au>Su, Jianbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning reliable modal weight with transformer for robust RGBT tracking</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-08-05</date><risdate>2022</risdate><volume>249</volume><spage>108945</spage><pages>108945-</pages><artnum>108945</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers. •An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108945</doi><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_2687833712
source	Elsevier ScienceDirect Journals
subjects	Feature extraction RGBT tracking Robustness Semantic features Semantics Tracking Transformer Transformers
title	Learning reliable modal weight with transformer for robust RGBT tracking
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A57%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20reliable%20modal%20weight%20with%20transformer%20for%20robust%20RGBT%20tracking&rft.jtitle=Knowledge-based%20systems&rft.au=Feng,%20Mingzheng&rft.date=2022-08-05&rft.volume=249&rft.spage=108945&rft.pages=108945-&rft.artnum=108945&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108945&rft_dat=%3Cproquest_cross%3E2687833712%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2687833712&rft_id=info:pmid/&rft_els_id=S0950705122004579&rfr_iscdi=true