Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning

In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2018-10, Vol.28 (10), p.2849-2860
Hauptverfasser:	Xue, Wanli, Xu, Chao, Feng, Zhiyong
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Automobiles Context Feature extraction Image color analysis Image detection Machine learning multi-scale Optical tracking perceptual hash salient sample sample selection spatio-temporal State of the art Target tracking Visual tracking Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2860
container_issue	10
container_start_page	2849
container_title	IEEE transactions on circuits and systems for video technology
container_volume	28
creator	Xue, Wanli Xu, Chao Feng, Zhiyong
description	In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.
doi_str_mv	10.1109/TCSVT.2017.2720749
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2126463217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7961203</ieee_id><sourcerecordid>2126463217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</originalsourceid><addsrcrecordid>eNo9kE1LxDAQQIMouH78Ab0UPGedSZqkPUrRVVgR3LLXkNZEunbbmqSi_96uu3iaObw3A4-QK4Q5IuS3ZbFal3MGqOZMMVBpfkRmKERGGQNxPO0gkGYMxSk5C2EDgGmWqhlZvPbVGGKybsJo2qT0pv5ouvfkqzHJ89jGhq5q09pkNZjY9LS026H3E1j0XbTfMVla47tJuCAnzrTBXh7mOSkf7svikS5fFk_F3ZLWLBeRZlbkNVinlOGVylQNEqTl6ZvgPEchOUjuDHdVLirhTIYKoBaVc1kFTqT8nNzszw6-_xxtiHrTj76bPmqGTKaSM1QTxfZU7fsQvHV68M3W-B-NoHe99F8vveulD70m6XovNdbaf0HlEhlw_gvvU2YV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2126463217</pqid></control><display><type>article</type><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</creator><creatorcontrib>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</creatorcontrib><description>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2017.2720749</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Automobiles ; Context ; Feature extraction ; Image color analysis ; Image detection ; Machine learning ; multi-scale ; Optical tracking ; perceptual hash ; salient sample ; sample selection ; spatio-temporal ; State of the art ; Target tracking ; Visual tracking ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2018-10, Vol.28 (10), p.2849-2860</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</citedby><cites>FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</cites><orcidid>0000-0001-8158-7453 ; 0000-0002-6031-9334 ; 0000-0002-6398-0398</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7961203$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7961203$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xue, Wanli</creatorcontrib><creatorcontrib>Xu, Chao</creatorcontrib><creatorcontrib>Feng, Zhiyong</creatorcontrib><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</description><subject>Algorithms</subject><subject>Automobiles</subject><subject>Context</subject><subject>Feature extraction</subject><subject>Image color analysis</subject><subject>Image detection</subject><subject>Machine learning</subject><subject>multi-scale</subject><subject>Optical tracking</subject><subject>perceptual hash</subject><subject>salient sample</subject><subject>sample selection</subject><subject>spatio-temporal</subject><subject>State of the art</subject><subject>Target tracking</subject><subject>Visual tracking</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LxDAQQIMouH78Ab0UPGedSZqkPUrRVVgR3LLXkNZEunbbmqSi_96uu3iaObw3A4-QK4Q5IuS3ZbFal3MGqOZMMVBpfkRmKERGGQNxPO0gkGYMxSk5C2EDgGmWqhlZvPbVGGKybsJo2qT0pv5ouvfkqzHJ89jGhq5q09pkNZjY9LS026H3E1j0XbTfMVla47tJuCAnzrTBXh7mOSkf7svikS5fFk_F3ZLWLBeRZlbkNVinlOGVylQNEqTl6ZvgPEchOUjuDHdVLirhTIYKoBaVc1kFTqT8nNzszw6-_xxtiHrTj76bPmqGTKaSM1QTxfZU7fsQvHV68M3W-B-NoHe99F8vveulD70m6XovNdbaf0HlEhlw_gvvU2YV</recordid><startdate>20181001</startdate><enddate>20181001</enddate><creator>Xue, Wanli</creator><creator>Xu, Chao</creator><creator>Feng, Zhiyong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8158-7453</orcidid><orcidid>https://orcid.org/0000-0002-6031-9334</orcidid><orcidid>https://orcid.org/0000-0002-6398-0398</orcidid></search><sort><creationdate>20181001</creationdate><title>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</title><author>Xue, Wanli ; Xu, Chao ; Feng, Zhiyong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-8e59c0ef77a3b787c0606e34d53391563063fa3fb95b5fa81700c5bff8b0f543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Automobiles</topic><topic>Context</topic><topic>Feature extraction</topic><topic>Image color analysis</topic><topic>Image detection</topic><topic>Machine learning</topic><topic>multi-scale</topic><topic>Optical tracking</topic><topic>perceptual hash</topic><topic>salient sample</topic><topic>sample selection</topic><topic>spatio-temporal</topic><topic>State of the art</topic><topic>Target tracking</topic><topic>Visual tracking</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xue, Wanli</creatorcontrib><creatorcontrib>Xu, Chao</creatorcontrib><creatorcontrib>Feng, Zhiyong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xue, Wanli</au><au>Xu, Chao</au><au>Feng, Zhiyong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2018-10-01</date><risdate>2018</risdate><volume>28</volume><issue>10</issue><spage>2849</spage><epage>2860</epage><pages>2849-2860</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2017.2720749</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8158-7453</orcidid><orcidid>https://orcid.org/0000-0002-6031-9334</orcidid><orcidid>https://orcid.org/0000-0002-6398-0398</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2018-10, Vol.28 (10), p.2849-2860
issn	1051-8215 1558-2205
language	eng
recordid	cdi_proquest_journals_2126463217
source	IEEE Electronic Library (IEL)
subjects	Algorithms Automobiles Context Feature extraction Image color analysis Image detection Machine learning multi-scale Optical tracking perceptual hash salient sample sample selection spatio-temporal State of the art Target tracking Visual tracking Visualization
title	Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T01%3A32%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Visual%20Tracking%20via%20Multi-Scale%20Spatio-Temporal%20Context%20Learning&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Xue,%20Wanli&rft.date=2018-10-01&rft.volume=28&rft.issue=10&rft.spage=2849&rft.epage=2860&rft.pages=2849-2860&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2017.2720749&rft_dat=%3Cproquest_RIE%3E2126463217%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2126463217&rft_id=info:pmid/&rft_ieee_id=7961203&rfr_iscdi=true