Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning

In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2018-10, Vol.28 (10), p.2849-2860
Hauptverfasser: Xue, Wanli, Xu, Chao, Feng, Zhiyong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2860
container_issue 10
container_start_page 2849
container_title IEEE transactions on circuits and systems for video technology
container_volume 28
creator Xue, Wanli
Xu, Chao
Feng, Zhiyong
description In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.
doi_str_mv 10.1109/TCSVT.2017.2720749
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2126463217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7961203</ieee_id><sourcerecordid>2126463217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</originalsourceid><addsrcrecordid>eNo9kE1LxDAQQIMouH78Ab0UPGedSZqkPUrRVVgR3LLXkNZEunbbmqSi_96uu3iaObw3A4-QK4Q5IuS3ZbFal3MGqOZMMVBpfkRmKERGGQNxPO0gkGYMxSk5C2EDgGmWqhlZvPbVGGKybsJo2qT0pv5ouvfkqzHJ89jGhq5q09pkNZjY9LS026H3E1j0XbTfMVla47tJuCAnzrTBXh7mOSkf7svikS5fFk_F3ZLWLBeRZlbkNVinlOGVylQNEqTl6ZvgPEchOUjuDHdVLirhTIYKoBaVc1kFTqT8nNzszw6-_xxtiHrTj76bPmqGTKaSM1QTxfZU7fsQvHV68M3W-B-NoHe99F8vveulD70m6XovNdbaf0HlEhlw_gvvU2YV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2126463217</pqid></control><display><type>article</type><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</creator><creatorcontrib>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</creatorcontrib><description>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2017.2720749</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Automobiles ; Context ; Feature extraction ; Image color analysis ; Image detection ; Machine learning ; multi-scale ; Optical tracking ; perceptual hash ; salient sample ; sample selection ; spatio-temporal ; State of the art ; Target tracking ; Visual tracking ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2018-10, Vol.28 (10), p.2849-2860</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</citedby><cites>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</cites><orcidid>0000-0001-8158-7453 ; 0000-0002-6031-9334 ; 0000-0002-6398-0398</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7961203$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7961203$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xue, Wanli</creatorcontrib><creatorcontrib>Xu, Chao</creatorcontrib><creatorcontrib>Feng, Zhiyong</creatorcontrib><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</description><subject>Algorithms</subject><subject>Automobiles</subject><subject>Context</subject><subject>Feature extraction</subject><subject>Image color analysis</subject><subject>Image detection</subject><subject>Machine learning</subject><subject>multi-scale</subject><subject>Optical tracking</subject><subject>perceptual hash</subject><subject>salient sample</subject><subject>sample selection</subject><subject>spatio-temporal</subject><subject>State of the art</subject><subject>Target tracking</subject><subject>Visual tracking</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LxDAQQIMouH78Ab0UPGedSZqkPUrRVVgR3LLXkNZEunbbmqSi_96uu3iaObw3A4-QK4Q5IuS3ZbFal3MGqOZMMVBpfkRmKERGGQNxPO0gkGYMxSk5C2EDgGmWqhlZvPbVGGKybsJo2qT0pv5ouvfkqzHJ89jGhq5q09pkNZjY9LS026H3E1j0XbTfMVla47tJuCAnzrTBXh7mOSkf7svikS5fFk_F3ZLWLBeRZlbkNVinlOGVylQNEqTl6ZvgPEchOUjuDHdVLirhTIYKoBaVc1kFTqT8nNzszw6-_xxtiHrTj76bPmqGTKaSM1QTxfZU7fsQvHV68M3W-B-NoHe99F8vveulD70m6XovNdbaf0HlEhlw_gvvU2YV</recordid><startdate>20181001</startdate><enddate>20181001</enddate><creator>Xue, Wanli</creator><creator>Xu, Chao</creator><creator>Feng, Zhiyong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8158-7453</orcidid><orcidid>https://orcid.org/0000-0002-6031-9334</orcidid><orcidid>https://orcid.org/0000-0002-6398-0398</orcidid></search><sort><creationdate>20181001</creationdate><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><author>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Automobiles</topic><topic>Context</topic><topic>Feature extraction</topic><topic>Image color analysis</topic><topic>Image detection</topic><topic>Machine learning</topic><topic>multi-scale</topic><topic>Optical tracking</topic><topic>perceptual hash</topic><topic>salient sample</topic><topic>sample selection</topic><topic>spatio-temporal</topic><topic>State of the art</topic><topic>Target tracking</topic><topic>Visual tracking</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xue, Wanli</creatorcontrib><creatorcontrib>Xu, Chao</creatorcontrib><creatorcontrib>Feng, Zhiyong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xue, Wanli</au><au>Xu, Chao</au><au>Feng, Zhiyong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2018-10-01</date><risdate>2018</risdate><volume>28</volume><issue>10</issue><spage>2849</spage><epage>2860</epage><pages>2849-2860</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2017.2720749</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8158-7453</orcidid><orcidid>https://orcid.org/0000-0002-6031-9334</orcidid><orcidid>https://orcid.org/0000-0002-6398-0398</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2018-10, Vol.28 (10), p.2849-2860
issn 1051-8215
1558-2205
language eng
recordid cdi_proquest_journals_2126463217
source IEEE Electronic Library (IEL)
subjects Algorithms
Automobiles
Context
Feature extraction
Image color analysis
Image detection
Machine learning
multi-scale
Optical tracking
perceptual hash
salient sample
sample selection
spatio-temporal
State of the art
Target tracking
Visual tracking
Visualization
title Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T01%3A32%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Visual%20Tracking%20via%20Multi-Scale%20Spatio-Temporal%20Context%20Learning&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Xue,%20Wanli&rft.date=2018-10-01&rft.volume=28&rft.issue=10&rft.spage=2849&rft.epage=2860&rft.pages=2849-2860&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2017.2720749&rft_dat=%3Cproquest_RIE%3E2126463217%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2126463217&rft_id=info:pmid/&rft_ieee_id=7961203&rfr_iscdi=true