Deep Learning for Visual Tracking: A Comprehensive Survey
Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable me...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2022-05, Vol.23 (5), p.3943-3968 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3968 |
---|---|
container_issue | 5 |
container_start_page | 3943 |
container_title | IEEE transactions on intelligent transportation systems |
container_volume | 23 |
creator | Marvasti-Zadeh, Seyed Mojtaba Cheng, Li Ghanei-Yakhdan, Hossein Kasaei, Shohreh |
description | Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions. |
doi_str_mv | 10.1109/TITS.2020.3046478 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2659347107</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9339950</ieee_id><sourcerecordid>2659347107</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE6d2a9kvZX6VSh4aPS6bJOJprZJ3G0K_fcmtHia4eV5Z-Bh7BZhggjmIZtnywkHDhMBUsskPWMjVCqNAVCfDzuXsQEFl-wqhHWfSoU4YuaJqI0W5Hxd1V9R2fjoswqd20SZd_lPnz1G02jWbFtP31SHak_RsvN7Olyzi9JtAt2c5ph9vDxns7d48f46n00XcS4k7mLnJKFbCY1SSOBSOCSdFqVwZaJSpbEQCRLXq5UwSQqJ42WROilFzou0kCDG7P54t_XNb0dhZ9dN5-v-peVaGSEThKSn8EjlvgnBU2lbX22dP1gEOxiygyE7GLInQ33n7tipiOifN0IYo0D8AWVfX_U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2659347107</pqid></control><display><type>article</type><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</creator><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</creatorcontrib><description>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</description><identifier>ISSN: 1524-9050</identifier><identifier>EISSN: 1558-0016</identifier><identifier>DOI: 10.1109/TITS.2020.3046478</identifier><identifier>CODEN: ITISFG</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>appearance modeling ; Benchmark testing ; Benchmarks ; Computer architecture ; Computer vision ; Correlation ; Datasets ; Deep learning ; Evaluation ; Exploitation ; Feature extraction ; Optical tracking ; Target tracking ; Tracking ; Training ; Visual tracking ; Visualization</subject><ispartof>IEEE transactions on intelligent transportation systems, 2022-05, Vol.23 (5), p.3943-3968</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</citedby><cites>FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</cites><orcidid>0000-0003-4575-1062 ; 0000-0002-3831-0878 ; 0000-0003-0536-0796 ; 0000-0003-3261-3533</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9339950$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9339950$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Ghanei-Yakhdan, Hossein</creatorcontrib><creatorcontrib>Kasaei, Shohreh</creatorcontrib><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><title>IEEE transactions on intelligent transportation systems</title><addtitle>TITS</addtitle><description>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</description><subject>appearance modeling</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Computer architecture</subject><subject>Computer vision</subject><subject>Correlation</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Evaluation</subject><subject>Exploitation</subject><subject>Feature extraction</subject><subject>Optical tracking</subject><subject>Target tracking</subject><subject>Tracking</subject><subject>Training</subject><subject>Visual tracking</subject><subject>Visualization</subject><issn>1524-9050</issn><issn>1558-0016</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE6d2a9kvZX6VSh4aPS6bJOJprZJ3G0K_fcmtHia4eV5Z-Bh7BZhggjmIZtnywkHDhMBUsskPWMjVCqNAVCfDzuXsQEFl-wqhHWfSoU4YuaJqI0W5Hxd1V9R2fjoswqd20SZd_lPnz1G02jWbFtP31SHak_RsvN7Olyzi9JtAt2c5ph9vDxns7d48f46n00XcS4k7mLnJKFbCY1SSOBSOCSdFqVwZaJSpbEQCRLXq5UwSQqJ42WROilFzou0kCDG7P54t_XNb0dhZ9dN5-v-peVaGSEThKSn8EjlvgnBU2lbX22dP1gEOxiygyE7GLInQ33n7tipiOifN0IYo0D8AWVfX_U</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Marvasti-Zadeh, Seyed Mojtaba</creator><creator>Cheng, Li</creator><creator>Ghanei-Yakhdan, Hossein</creator><creator>Kasaei, Shohreh</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4575-1062</orcidid><orcidid>https://orcid.org/0000-0002-3831-0878</orcidid><orcidid>https://orcid.org/0000-0003-0536-0796</orcidid><orcidid>https://orcid.org/0000-0003-3261-3533</orcidid></search><sort><creationdate>20220501</creationdate><title>Deep Learning for Visual Tracking: A Comprehensive Survey</title><author>Marvasti-Zadeh, Seyed Mojtaba ; Cheng, Li ; Ghanei-Yakhdan, Hossein ; Kasaei, Shohreh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-aa4e1ab3614340243a1e68df3af758561d371e26bb397807a2fd8a443c2d8d403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>appearance modeling</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Computer architecture</topic><topic>Computer vision</topic><topic>Correlation</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Evaluation</topic><topic>Exploitation</topic><topic>Feature extraction</topic><topic>Optical tracking</topic><topic>Target tracking</topic><topic>Tracking</topic><topic>Training</topic><topic>Visual tracking</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marvasti-Zadeh, Seyed Mojtaba</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Ghanei-Yakhdan, Hossein</creatorcontrib><creatorcontrib>Kasaei, Shohreh</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on intelligent transportation systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Marvasti-Zadeh, Seyed Mojtaba</au><au>Cheng, Li</au><au>Ghanei-Yakhdan, Hossein</au><au>Kasaei, Shohreh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Learning for Visual Tracking: A Comprehensive Survey</atitle><jtitle>IEEE transactions on intelligent transportation systems</jtitle><stitle>TITS</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>23</volume><issue>5</issue><spage>3943</spage><epage>3968</epage><pages>3943-3968</pages><issn>1524-9050</issn><eissn>1558-0016</eissn><coden>ITISFG</coden><abstract>Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TITS.2020.3046478</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0003-4575-1062</orcidid><orcidid>https://orcid.org/0000-0002-3831-0878</orcidid><orcidid>https://orcid.org/0000-0003-0536-0796</orcidid><orcidid>https://orcid.org/0000-0003-3261-3533</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1524-9050 |
ispartof | IEEE transactions on intelligent transportation systems, 2022-05, Vol.23 (5), p.3943-3968 |
issn | 1524-9050 1558-0016 |
language | eng |
recordid | cdi_proquest_journals_2659347107 |
source | IEEE/IET Electronic Library (IEL) |
subjects | appearance modeling Benchmark testing Benchmarks Computer architecture Computer vision Correlation Datasets Deep learning Evaluation Exploitation Feature extraction Optical tracking Target tracking Tracking Training Visual tracking Visualization |
title | Deep Learning for Visual Tracking: A Comprehensive Survey |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T18%3A55%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Learning%20for%20Visual%20Tracking:%20A%20Comprehensive%20Survey&rft.jtitle=IEEE%20transactions%20on%20intelligent%20transportation%20systems&rft.au=Marvasti-Zadeh,%20Seyed%20Mojtaba&rft.date=2022-05-01&rft.volume=23&rft.issue=5&rft.spage=3943&rft.epage=3968&rft.pages=3943-3968&rft.issn=1524-9050&rft.eissn=1558-0016&rft.coden=ITISFG&rft_id=info:doi/10.1109/TITS.2020.3046478&rft_dat=%3Cproquest_RIE%3E2659347107%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2659347107&rft_id=info:pmid/&rft_ieee_id=9339950&rfr_iscdi=true |