Deep Learning for Visual Tracking: A Comprehensive Survey

Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on intelligent transportation systems 2022-05, Vol.23 (5), p.3943-3968
Hauptverfasser: Marvasti-Zadeh, Seyed Mojtaba, Cheng, Li, Ghanei-Yakhdan, Hossein, Kasaei, Shohreh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3968
container_issue 5
container_start_page 3943
container_title IEEE transactions on intelligent transportation systems
container_volume 23
creator Marvasti-Zadeh, Seyed Mojtaba
Cheng, Li
Ghanei-Yakhdan, Hossein
Kasaei, Shohreh
description Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.
doi_str_mv 10.1109/TITS.2020.3046478
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2659347107</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9339950</ieee_id><sourcerecordid>2659347107</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE6d2a9kvZX6VSh4aPS6bJOJprZJ3G0K_fcmtHia4eV5Z-Bh7BZhggjmIZtnywkHDhMBUsskPWMjVCqNAVCfDzuXsQEFl-wqhHWfSoU4YuaJqI0W5Hxd1V9R2fjoswqd20SZd_lPnz1G02jWbFtP31SHak_RsvN7Olyzi9JtAt2c5ph9vDxns7d48f46n00XcS4k7mLnJKFbCY1SSOBSOCSdFqVwZaJSpbEQCRLXq5UwSQqJ42WROilFzou0kCDG7P54t_XNb0dhZ9dN5-v-peVaGSEThKSn8EjlvgnBU2lbX22dP1gEOxiygyE7GLInQ33n7tipiOifN0IYo0D8AWVfX_U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2659347107</pqid></control><display><type>article</type><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</creator><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</creatorcontrib><description>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</description><identifier>ISSN: 1524-9050</identifier><identifier>EISSN: 1558-0016</identifier><identifier>DOI: 10.1109/TITS.2020.3046478</identifier><identifier>CODEN: ITISFG</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>appearance modeling ; Benchmark testing ; Benchmarks ; Computer architecture ; Computer vision ; Correlation ; Datasets ; Deep learning ; Evaluation ; Exploitation ; Feature extraction ; Optical tracking ; Target tracking ; Tracking ; Training ; Visual tracking ; Visualization</subject><ispartof>IEEE transactions on intelligent transportation systems, 2022-05, Vol.23 (5), p.3943-3968</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</citedby><cites>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</cites><orcidid>0000-0003-4575-1062 ; 0000-0002-3831-0878 ; 0000-0003-0536-0796 ; 0000-0003-3261-3533</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9339950$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9339950$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Ghanei-Yakhdan, Hossein</creatorcontrib><creatorcontrib>Kasaei, Shohreh</creatorcontrib><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><title>IEEE transactions on intelligent transportation systems</title><addtitle>TITS</addtitle><description>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</description><subject>appearance modeling</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Computer architecture</subject><subject>Computer vision</subject><subject>Correlation</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Evaluation</subject><subject>Exploitation</subject><subject>Feature extraction</subject><subject>Optical tracking</subject><subject>Target tracking</subject><subject>Tracking</subject><subject>Training</subject><subject>Visual tracking</subject><subject>Visualization</subject><issn>1524-9050</issn><issn>1558-0016</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE6d2a9kvZX6VSh4aPS6bJOJprZJ3G0K_fcmtHia4eV5Z-Bh7BZhggjmIZtnywkHDhMBUsskPWMjVCqNAVCfDzuXsQEFl-wqhHWfSoU4YuaJqI0W5Hxd1V9R2fjoswqd20SZd_lPnz1G02jWbFtP31SHak_RsvN7Olyzi9JtAt2c5ph9vDxns7d48f46n00XcS4k7mLnJKFbCY1SSOBSOCSdFqVwZaJSpbEQCRLXq5UwSQqJ42WROilFzou0kCDG7P54t_XNb0dhZ9dN5-v-peVaGSEThKSn8EjlvgnBU2lbX22dP1gEOxiygyE7GLInQ33n7tipiOifN0IYo0D8AWVfX_U</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Marvasti-Zadeh, Seyed Mojtaba</creator><creator>Cheng, Li</creator><creator>Ghanei-Yakhdan, Hossein</creator><creator>Kasaei, Shohreh</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4575-1062</orcidid><orcidid>https://orcid.org/0000-0002-3831-0878</orcidid><orcidid>https://orcid.org/0000-0003-0536-0796</orcidid><orcidid>https://orcid.org/0000-0003-3261-3533</orcidid></search><sort><creationdate>20220501</creationdate><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><author>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>appearance modeling</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Computer architecture</topic><topic>Computer vision</topic><topic>Correlation</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Evaluation</topic><topic>Exploitation</topic><topic>Feature extraction</topic><topic>Optical tracking</topic><topic>Target tracking</topic><topic>Tracking</topic><topic>Training</topic><topic>Visual tracking</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Ghanei-Yakhdan, Hossein</creatorcontrib><creatorcontrib>Kasaei, Shohreh</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on intelligent transportation systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Marvasti-Zadeh, Seyed Mojtaba</au><au>Cheng, Li</au><au>Ghanei-Yakhdan, Hossein</au><au>Kasaei, Shohreh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Learning for Visual Tracking: A Comprehensive Survey</atitle><jtitle>IEEE transactions on intelligent transportation systems</jtitle><stitle>TITS</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>23</volume><issue>5</issue><spage>3943</spage><epage>3968</epage><pages>3943-3968</pages><issn>1524-9050</issn><eissn>1558-0016</eissn><coden>ITISFG</coden><abstract>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TITS.2020.3046478</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0003-4575-1062</orcidid><orcidid>https://orcid.org/0000-0002-3831-0878</orcidid><orcidid>https://orcid.org/0000-0003-0536-0796</orcidid><orcidid>https://orcid.org/0000-0003-3261-3533</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1524-9050
ispartof IEEE transactions on intelligent transportation systems, 2022-05, Vol.23 (5), p.3943-3968
issn 1524-9050
1558-0016
language eng
recordid cdi_proquest_journals_2659347107
source IEEE/IET Electronic Library (IEL)
subjects appearance modeling
Benchmark testing
Benchmarks
Computer architecture
Computer vision
Correlation
Datasets
Deep learning
Evaluation
Exploitation
Feature extraction
Optical tracking
Target tracking
Tracking
Training
Visual tracking
Visualization
title Deep Learning for Visual Tracking: A Comprehensive Survey
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T18%3A55%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Learning%20for%20Visual%20Tracking:%20A%20Comprehensive%20Survey&rft.jtitle=IEEE%20transactions%20on%20intelligent%20transportation%20systems&rft.au=Marvasti-Zadeh,%20Seyed%20Mojtaba&rft.date=2022-05-01&rft.volume=23&rft.issue=5&rft.spage=3943&rft.epage=3968&rft.pages=3943-3968&rft.issn=1524-9050&rft.eissn=1558-0016&rft.coden=ITISFG&rft_id=info:doi/10.1109/TITS.2020.3046478&rft_dat=%3Cproquest_RIE%3E2659347107%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2659347107&rft_id=info:pmid/&rft_ieee_id=9339950&rfr_iscdi=true