CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer

In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024-01, Vol.26, p.1-16
Hauptverfasser:	Yu, Sheng, Zhai, Di-Hua, Xia, Yuanqing, Li, Dong, Zhao, Shiqi
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature extraction Matching Modules Object tracking Pose estimation pose tracking Target tracking Task analysis Three dimensional models Three-dimensional displays Tracking networks transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	16
container_issue
container_start_page	1
container_title	IEEE transactions on multimedia
container_volume	26
creator	Yu, Sheng Zhai, Di-Hua Xia, Yuanqing Li, Dong Zhao, Shiqi
description	In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .
doi_str_mv	10.1109/TMM.2023.3284598
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2918029130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10149532</ieee_id><sourcerecordid>2918029130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</originalsourceid><addsrcrecordid>eNpNkM1LAzEQxYMoWKt3Dx4CnrdOssl-eJP6CS0VWr2GNDspW7cbTbaF_vdmbQ9eZh6P35uBR8g1gxFjUN4tptMRB56OUl4IWRYnZMBKwRKAPD-NWnJISs7gnFyEsAZgQkI-INVYdwuvzdc9ndftqsFk3ukV0mjjyvl9MsEdNjR7pLPlGk1H311A-peION3Vmo5du3PNtqtdS3Vb0c869DIybbDOb9BfkjOrm4BXxz0kH89Pi_FrMpm9vI0fJonhQnaJyAwIQF4aazlnBu1SclZJscQsyyvNZG4yYwVKqITobShsJizyVKJmPB2S28Pdb-9-thg6tXZb38aXipesgDhSiBQcKONdCB6t-vb1Rvu9YqD6LlXsUvVdqmOXMXJziNSI-A9nopQpT38B0bBv1Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918029130</pqid></control><display><type>article</type><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><source>IEEE Electronic Library (IEL)</source><creator>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</creator><creatorcontrib>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</creatorcontrib><description>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3284598</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Feature extraction ; Matching ; Modules ; Object tracking ; Pose estimation ; pose tracking ; Target tracking ; Task analysis ; Three dimensional models ; Three-dimensional displays ; Tracking networks ; transformer ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024-01, Vol.26, p.1-16</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</cites><orcidid>0000-0001-8653-8626 ; 0000-0002-5977-4911 ; 0009-0004-6523-6900 ; 0009-0002-0709-1024</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10149532$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10149532$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yu, Sheng</creatorcontrib><creatorcontrib>Zhai, Di-Hua</creatorcontrib><creatorcontrib>Xia, Yuanqing</creatorcontrib><creatorcontrib>Li, Dong</creatorcontrib><creatorcontrib>Zhao, Shiqi</creatorcontrib><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</description><subject>Feature extraction</subject><subject>Matching</subject><subject>Modules</subject><subject>Object tracking</subject><subject>Pose estimation</subject><subject>pose tracking</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Tracking networks</subject><subject>transformer</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1LAzEQxYMoWKt3Dx4CnrdOssl-eJP6CS0VWr2GNDspW7cbTbaF_vdmbQ9eZh6P35uBR8g1gxFjUN4tptMRB56OUl4IWRYnZMBKwRKAPD-NWnJISs7gnFyEsAZgQkI-INVYdwuvzdc9ndftqsFk3ukV0mjjyvl9MsEdNjR7pLPlGk1H311A-peION3Vmo5du3PNtqtdS3Vb0c869DIybbDOb9BfkjOrm4BXxz0kH89Pi_FrMpm9vI0fJonhQnaJyAwIQF4aazlnBu1SclZJscQsyyvNZG4yYwVKqITobShsJizyVKJmPB2S28Pdb-9-thg6tXZb38aXipesgDhSiBQcKONdCB6t-vb1Rvu9YqD6LlXsUvVdqmOXMXJziNSI-A9nopQpT38B0bBv1Q</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Yu, Sheng</creator><creator>Zhai, Di-Hua</creator><creator>Xia, Yuanqing</creator><creator>Li, Dong</creator><creator>Zhao, Shiqi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8653-8626</orcidid><orcidid>https://orcid.org/0000-0002-5977-4911</orcidid><orcidid>https://orcid.org/0009-0004-6523-6900</orcidid><orcidid>https://orcid.org/0009-0002-0709-1024</orcidid></search><sort><creationdate>20240101</creationdate><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><author>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Feature extraction</topic><topic>Matching</topic><topic>Modules</topic><topic>Object tracking</topic><topic>Pose estimation</topic><topic>pose tracking</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Tracking networks</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yu, Sheng</creatorcontrib><creatorcontrib>Zhai, Di-Hua</creatorcontrib><creatorcontrib>Xia, Yuanqing</creatorcontrib><creatorcontrib>Li, Dong</creatorcontrib><creatorcontrib>Zhao, Shiqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Sheng</au><au>Zhai, Di-Hua</au><au>Xia, Yuanqing</au><au>Li, Dong</au><au>Zhao, Shiqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>26</volume><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3284598</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-8653-8626</orcidid><orcidid>https://orcid.org/0000-0002-5977-4911</orcidid><orcidid>https://orcid.org/0009-0004-6523-6900</orcidid><orcidid>https://orcid.org/0009-0002-0709-1024</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2024-01, Vol.26, p.1-16
issn	1520-9210 1941-0077
language	eng
recordid	cdi_proquest_journals_2918029130
source	IEEE Electronic Library (IEL)
subjects	Feature extraction Matching Modules Object tracking Pose estimation pose tracking Target tracking Task analysis Three dimensional models Three-dimensional displays Tracking networks transformer Transformers
title	CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A49%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CatTrack:%20Single-Stage%20Category-Level%206D%20Object%20Pose%20Tracking%20via%20Convolution%20and%20Vision%20Transformer&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Yu,%20Sheng&rft.date=2024-01-01&rft.volume=26&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3284598&rft_dat=%3Cproquest_RIE%3E2918029130%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918029130&rft_id=info:pmid/&rft_ieee_id=10149532&rfr_iscdi=true