Learning reliable modal weight with transformer for robust RGBT tracking
Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a str...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2022-08, Vol.249, p.108945, Article 108945 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 108945 |
container_title | Knowledge-based systems |
container_volume | 249 |
creator | Feng, Mingzheng Su, Jianbo |
description | Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers.
•An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers. |
doi_str_mv | 10.1016/j.knosys.2022.108945 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2687833712</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122004579</els_id><sourcerecordid>2687833712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</originalsourceid><addsrcrecordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2687833712</pqid></control><display><type>article</type><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><source>Elsevier ScienceDirect Journals</source><creator>Feng, Mingzheng ; Su, Jianbo</creator><creatorcontrib>Feng, Mingzheng ; Su, Jianbo</creatorcontrib><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers.
•An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108945</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Feature extraction ; RGBT tracking ; Robustness ; Semantic features ; Semantics ; Tracking ; Transformer ; Transformers</subject><ispartof>Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Aug 5, 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</citedby><cites>FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</cites><orcidid>0000-0001-6931-5842</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122004579$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><title>Knowledge-based systems</title><description>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers.
•An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</description><subject>Feature extraction</subject><subject>RGBT tracking</subject><subject>Robustness</subject><subject>Semantic features</subject><subject>Semantics</subject><subject>Tracking</subject><subject>Transformer</subject><subject>Transformers</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kFFLwzAUhYMoOKf_wIeAz51J2ibpi6BDN2EgyHwOaXq7peuamXSO_XtT6rNPB-79zrncg9A9JTNKKH9sZrvOhXOYMcJYHMkiyy_QhErBEpGR4hJNSJGTRJCcXqObEBpCIknlBC1XoH1nuw320FpdtoD3rtItPoHdbHt8sv0W9153oXZ-Dx5Hwd6Vx9Djz8XLetiZXfTfoqtatwHu_nSKvt5e1_NlsvpYvM-fV4lhPOsTmkpZgpZFmVEmhJacloxpKI0oBRG81nWaVZrKOq8oN7zipJAFkySlXArD0il6GHMP3n0fIfSqcUffxZOKRUKmqaADlY2U8S4ED7U6eLvX_qwoUUNnqlFjZ2roTI2dRdvTaIP4wY8Fr4Kx0BmorAfTq8rZ_wN-AauMdlw</recordid><startdate>20220805</startdate><enddate>20220805</enddate><creator>Feng, Mingzheng</creator><creator>Su, Jianbo</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></search><sort><creationdate>20220805</creationdate><title>Learning reliable modal weight with transformer for robust RGBT tracking</title><author>Feng, Mingzheng ; Su, Jianbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c264t-1388bea89b41277a861b22aebc7b7076faf34da18f5d16c6d6098928031687c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Feature extraction</topic><topic>RGBT tracking</topic><topic>Robustness</topic><topic>Semantic features</topic><topic>Semantics</topic><topic>Tracking</topic><topic>Transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Mingzheng</creatorcontrib><creatorcontrib>Su, Jianbo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feng, Mingzheng</au><au>Su, Jianbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning reliable modal weight with transformer for robust RGBT tracking</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-08-05</date><risdate>2022</risdate><volume>249</volume><spage>108945</spage><pages>108945-</pages><artnum>108945</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Many Siamese-based RGBT trackers have been prevalently designed in recent years for fast-tracking. However, the correlation operation in them is a local linear matching process, which may easily lose semantic information required inevitably by high-precision trackers. In this paper, we propose a strong cross-modal model based on transformer for robust RGBT tracking. Specifically, a simple dual-flow convolutional network is designed to extract and fuse dual-modal features, with comparably lower complexity. Besides, to enhance the feature representation and deepen semantic features, a modal weight allocation strategy and a backbone feature extracted network based on modified Resnet-50 are designed, respectively. Also, an attention-based transformer feature fusion network is adopted to improve long-distance feature association to decrease the loss of semantic information. Finally, a classification regression subnetwork is investigated to accurately predict the state of the target. Sufficient experiments have been implemented on the RGBT234, RGBT210, GTOT and LasHeR datasets, demonstrating more outstanding tracking performance against the state-of-the-art RGBT trackers.
•An RGBT tracking framework based on the transformer is designed, which can enhance long-distance feature association and decrease the loss of semantic information. To our knowledge, this is the first time to incorporate the transformer in RGBT tracking.•A shallow convolutional network is designed to extract and fuse multi-modal information, which significantly simplifies the calculation process. Moreover, an optimal modal weight allocation strategy is proposed to obtain reliable weight for effectively optimizing fused features.•A classification and regression subnetwork by adding a central branch is adopted to reduce the interference of background, further improving the accuracy of target prediction.•Sufficient experimental results on four large benchmark datasets, RGBT234 (Li et al. 2019), RGBT210 (Li et al. 2017), GTOT (Li et al. 2016) and LasHeR (Li et al. 2022) indicate that the proposed tracker obtains more outstanding performance compared to the state-of-the-art RGBT trackers.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108945</doi><orcidid>https://orcid.org/0000-0001-6931-5842</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0950-7051 |
ispartof | Knowledge-based systems, 2022-08, Vol.249, p.108945, Article 108945 |
issn | 0950-7051 1872-7409 |
language | eng |
recordid | cdi_proquest_journals_2687833712 |
source | Elsevier ScienceDirect Journals |
subjects | Feature extraction RGBT tracking Robustness Semantic features Semantics Tracking Transformer Transformers |
title | Learning reliable modal weight with transformer for robust RGBT tracking |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A57%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20reliable%20modal%20weight%20with%20transformer%20for%20robust%20RGBT%20tracking&rft.jtitle=Knowledge-based%20systems&rft.au=Feng,%20Mingzheng&rft.date=2022-08-05&rft.volume=249&rft.spage=108945&rft.pages=108945-&rft.artnum=108945&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108945&rft_dat=%3Cproquest_cross%3E2687833712%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2687833712&rft_id=info:pmid/&rft_els_id=S0950705122004579&rfr_iscdi=true |