Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos

Depth cameras have recently become popular and many vision problems can be better solved with depth information. But, how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still underinvestigated in current literature on v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2019-03, Vol.21 (3), p.664-677
Hauptverfasser:	Liu, Ye, Jing, Xiao-Yuan, Nie, Jianhui, Gao, Hao, Liu, Jun, Jiang, Guo-Ping
Format:	Artikel
Sprache:	eng
Schlagworte:	Ambient intelligence Cameras Histograms Image color analysis mean-shift Occlusion Optical tracking Personal computers point cloud RGB-D camera Target tracking Three dimensional models Three-dimensional displays Two dimensional displays Visual tracking
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	677
container_issue	3
container_start_page	664
container_title	IEEE transactions on multimedia
container_volume	21
creator	Liu, Ye Jing, Xiao-Yuan Nie, Jianhui Gao, Hao Liu, Jun Jiang, Guo-Ping
description	Depth cameras have recently become popular and many vision problems can be better solved with depth information. But, how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still underinvestigated in current literature on visual tracking. In this paper, we investigate a 3-D extension of a classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2-D tracking. However, through careful study of the physical property of 3-D point clouds, we reveal that objects which may appear to be adjacent on a 2-D image will form distinctive modes in the 3-D probability distribution approximated by kernel density estimation, and finding the nearest mode using 3-D mean-shift can always work in tracking. Based on the understanding of 3-D mean-shift, we propose two important mechanisms to further boost the tracker's robustness: one is to enable the tracker to be aware of potential distractions and make corresponding adjustments to the appearance model; and the other is to enable the tracker to detect and recover from tracking failures caused by total occlusion. The proposed method is both effective and computationally efficient. On a conventional personal computer, it runs at more than 60 FPS without graphical processing unit acceleration.
doi_str_mv	10.1109/TMM.2018.2863604
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2018_2863604</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8425768</ieee_id><sourcerecordid>2185732951</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-b096f22c460c0caa32d8b4cc39365771ac2bab5bc003f114549023b0446e4e793</originalsourceid><addsrcrecordid>eNo9kN1LwzAUxYsoOKfvgi8BnzNvPto0j3PqJmwMZtXHkmSpy-zambSo_70tGz7dA-ecC-cXRdcERoSAvMsWixEFko5omrAE-Ek0IJITDCDEaadjClhSAufRRQhbAMJjEINoN6mrxv40ePytvEXZxluLH9zOVsHVlSrRwqoKv2xc0aB312zQ0piy7T00U9W6dNUHKmqPVrVuQ4OWemtNgzKvzGdvuQqtpvf4Ab25ta3DZXRWqDLYq-MdRq9Pj9lkhufL6fNkPMeGStJgDTIpKDU8AQNGKUbXqebGMMmSWAiiDNVKx9oAsIJ0S7gEyjRwnlhuhWTD6Pbwd-_rr9aGJt_Wre_mhJySNBaMyph0KTikjK9D8LbI997tlP_NCeQ91LyDmvdQ8yPUrnJzqDhr7X885TQWScr-AFveca0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2185732951</pqid></control><display><type>article</type><title>Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Ye ; Jing, Xiao-Yuan ; Nie, Jianhui ; Gao, Hao ; Liu, Jun ; Jiang, Guo-Ping</creator><creatorcontrib>Liu, Ye ; Jing, Xiao-Yuan ; Nie, Jianhui ; Gao, Hao ; Liu, Jun ; Jiang, Guo-Ping</creatorcontrib><description>Depth cameras have recently become popular and many vision problems can be better solved with depth information. But, how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still underinvestigated in current literature on visual tracking. In this paper, we investigate a 3-D extension of a classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2-D tracking. However, through careful study of the physical property of 3-D point clouds, we reveal that objects which may appear to be adjacent on a 2-D image will form distinctive modes in the 3-D probability distribution approximated by kernel density estimation, and finding the nearest mode using 3-D mean-shift can always work in tracking. Based on the understanding of 3-D mean-shift, we propose two important mechanisms to further boost the tracker's robustness: one is to enable the tracker to be aware of potential distractions and make corresponding adjustments to the appearance model; and the other is to enable the tracker to detect and recover from tracking failures caused by total occlusion. The proposed method is both effective and computationally efficient. On a conventional personal computer, it runs at more than 60 FPS without graphical processing unit acceleration.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2018.2863604</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Ambient intelligence ; Cameras ; Histograms ; Image color analysis ; mean-shift ; Occlusion ; Optical tracking ; Personal computers ; point cloud ; RGB-D camera ; Target tracking ; Three dimensional models ; Three-dimensional displays ; Two dimensional displays ; Visual tracking</subject><ispartof>IEEE transactions on multimedia, 2019-03, Vol.21 (3), p.664-677</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-b096f22c460c0caa32d8b4cc39365771ac2bab5bc003f114549023b0446e4e793</citedby><cites>FETCH-LOGICAL-c291t-b096f22c460c0caa32d8b4cc39365771ac2bab5bc003f114549023b0446e4e793</cites><orcidid>0000-0001-9448-0989 ; 0000-0002-4365-4165 ; 0000-0002-2686-3002</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8425768$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54737</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8425768$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Ye</creatorcontrib><creatorcontrib>Jing, Xiao-Yuan</creatorcontrib><creatorcontrib>Nie, Jianhui</creatorcontrib><creatorcontrib>Gao, Hao</creatorcontrib><creatorcontrib>Liu, Jun</creatorcontrib><creatorcontrib>Jiang, Guo-Ping</creatorcontrib><title>Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Depth cameras have recently become popular and many vision problems can be better solved with depth information. But, how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still underinvestigated in current literature on visual tracking. In this paper, we investigate a 3-D extension of a classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2-D tracking. However, through careful study of the physical property of 3-D point clouds, we reveal that objects which may appear to be adjacent on a 2-D image will form distinctive modes in the 3-D probability distribution approximated by kernel density estimation, and finding the nearest mode using 3-D mean-shift can always work in tracking. Based on the understanding of 3-D mean-shift, we propose two important mechanisms to further boost the tracker's robustness: one is to enable the tracker to be aware of potential distractions and make corresponding adjustments to the appearance model; and the other is to enable the tracker to detect and recover from tracking failures caused by total occlusion. The proposed method is both effective and computationally efficient. On a conventional personal computer, it runs at more than 60 FPS without graphical processing unit acceleration.</description><subject>Ambient intelligence</subject><subject>Cameras</subject><subject>Histograms</subject><subject>Image color analysis</subject><subject>mean-shift</subject><subject>Occlusion</subject><subject>Optical tracking</subject><subject>Personal computers</subject><subject>point cloud</subject><subject>RGB-D camera</subject><subject>Target tracking</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Two dimensional displays</subject><subject>Visual tracking</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kN1LwzAUxYsoOKfvgi8BnzNvPto0j3PqJmwMZtXHkmSpy-zambSo_70tGz7dA-ecC-cXRdcERoSAvMsWixEFko5omrAE-Ek0IJITDCDEaadjClhSAufRRQhbAMJjEINoN6mrxv40ePytvEXZxluLH9zOVsHVlSrRwqoKv2xc0aB312zQ0piy7T00U9W6dNUHKmqPVrVuQ4OWemtNgzKvzGdvuQqtpvf4Ab25ta3DZXRWqDLYq-MdRq9Pj9lkhufL6fNkPMeGStJgDTIpKDU8AQNGKUbXqebGMMmSWAiiDNVKx9oAsIJ0S7gEyjRwnlhuhWTD6Pbwd-_rr9aGJt_Wre_mhJySNBaMyph0KTikjK9D8LbI997tlP_NCeQ91LyDmvdQ8yPUrnJzqDhr7X885TQWScr-AFveca0</recordid><startdate>20190301</startdate><enddate>20190301</enddate><creator>Liu, Ye</creator><creator>Jing, Xiao-Yuan</creator><creator>Nie, Jianhui</creator><creator>Gao, Hao</creator><creator>Liu, Jun</creator><creator>Jiang, Guo-Ping</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9448-0989</orcidid><orcidid>https://orcid.org/0000-0002-4365-4165</orcidid><orcidid>https://orcid.org/0000-0002-2686-3002</orcidid></search><sort><creationdate>20190301</creationdate><title>Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos</title><author>Liu, Ye ; Jing, Xiao-Yuan ; Nie, Jianhui ; Gao, Hao ; Liu, Jun ; Jiang, Guo-Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-b096f22c460c0caa32d8b4cc39365771ac2bab5bc003f114549023b0446e4e793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Ambient intelligence</topic><topic>Cameras</topic><topic>Histograms</topic><topic>Image color analysis</topic><topic>mean-shift</topic><topic>Occlusion</topic><topic>Optical tracking</topic><topic>Personal computers</topic><topic>point cloud</topic><topic>RGB-D camera</topic><topic>Target tracking</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Two dimensional displays</topic><topic>Visual tracking</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Ye</creatorcontrib><creatorcontrib>Jing, Xiao-Yuan</creatorcontrib><creatorcontrib>Nie, Jianhui</creatorcontrib><creatorcontrib>Gao, Hao</creatorcontrib><creatorcontrib>Liu, Jun</creatorcontrib><creatorcontrib>Jiang, Guo-Ping</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Ye</au><au>Jing, Xiao-Yuan</au><au>Nie, Jianhui</au><au>Gao, Hao</au><au>Liu, Jun</au><au>Jiang, Guo-Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2019-03-01</date><risdate>2019</risdate><volume>21</volume><issue>3</issue><spage>664</spage><epage>677</epage><pages>664-677</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Depth cameras have recently become popular and many vision problems can be better solved with depth information. But, how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still underinvestigated in current literature on visual tracking. In this paper, we investigate a 3-D extension of a classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2-D tracking. However, through careful study of the physical property of 3-D point clouds, we reveal that objects which may appear to be adjacent on a 2-D image will form distinctive modes in the 3-D probability distribution approximated by kernel density estimation, and finding the nearest mode using 3-D mean-shift can always work in tracking. Based on the understanding of 3-D mean-shift, we propose two important mechanisms to further boost the tracker's robustness: one is to enable the tracker to be aware of potential distractions and make corresponding adjustments to the appearance model; and the other is to enable the tracker to detect and recover from tracking failures caused by total occlusion. The proposed method is both effective and computationally efficient. On a conventional personal computer, it runs at more than 60 FPS without graphical processing unit acceleration.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2018.2863604</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9448-0989</orcidid><orcidid>https://orcid.org/0000-0002-4365-4165</orcidid><orcidid>https://orcid.org/0000-0002-2686-3002</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2019-03, Vol.21 (3), p.664-677
issn	1520-9210 1941-0077
language	eng
recordid	cdi_crossref_primary_10_1109_TMM_2018_2863604
source	IEEE Electronic Library (IEL)
subjects	Ambient intelligence Cameras Histograms Image color analysis mean-shift Occlusion Optical tracking Personal computers point cloud RGB-D camera Target tracking Three dimensional models Three-dimensional displays Two dimensional displays Visual tracking
title	Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T12%3A07%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Context-Aware%20Three-Dimensional%20Mean-Shift%20With%20Occlusion%20Handling%20for%20Robust%20Object%20Tracking%20in%20RGB-D%20Videos&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Liu,%20Ye&rft.date=2019-03-01&rft.volume=21&rft.issue=3&rft.spage=664&rft.epage=677&rft.pages=664-677&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2018.2863604&rft_dat=%3Cproquest_RIE%3E2185732951%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2185732951&rft_id=info:pmid/&rft_ieee_id=8425768&rfr_iscdi=true