CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer
In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2024-01, Vol.26, p.1-16 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 16 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on multimedia |
container_volume | 26 |
creator | Yu, Sheng Zhai, Di-Hua Xia, Yuanqing Li, Dong Zhao, Shiqi |
description | In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk . |
doi_str_mv | 10.1109/TMM.2023.3284598 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2918029130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10149532</ieee_id><sourcerecordid>2918029130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</originalsourceid><addsrcrecordid>eNpNkM1LAzEQxYMoWKt3Dx4CnrdOssl-eJP6CS0VWr2GNDspW7cbTbaF_vdmbQ9eZh6P35uBR8g1gxFjUN4tptMRB56OUl4IWRYnZMBKwRKAPD-NWnJISs7gnFyEsAZgQkI-INVYdwuvzdc9ndftqsFk3ukV0mjjyvl9MsEdNjR7pLPlGk1H311A-peION3Vmo5du3PNtqtdS3Vb0c869DIybbDOb9BfkjOrm4BXxz0kH89Pi_FrMpm9vI0fJonhQnaJyAwIQF4aazlnBu1SclZJscQsyyvNZG4yYwVKqITobShsJizyVKJmPB2S28Pdb-9-thg6tXZb38aXipesgDhSiBQcKONdCB6t-vb1Rvu9YqD6LlXsUvVdqmOXMXJziNSI-A9nopQpT38B0bBv1Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918029130</pqid></control><display><type>article</type><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><source>IEEE Electronic Library (IEL)</source><creator>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</creator><creatorcontrib>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</creatorcontrib><description>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3284598</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Feature extraction ; Matching ; Modules ; Object tracking ; Pose estimation ; pose tracking ; Target tracking ; Task analysis ; Three dimensional models ; Three-dimensional displays ; Tracking networks ; transformer ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024-01, Vol.26, p.1-16</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</cites><orcidid>0000-0001-8653-8626 ; 0000-0002-5977-4911 ; 0009-0004-6523-6900 ; 0009-0002-0709-1024</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10149532$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10149532$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yu, Sheng</creatorcontrib><creatorcontrib>Zhai, Di-Hua</creatorcontrib><creatorcontrib>Xia, Yuanqing</creatorcontrib><creatorcontrib>Li, Dong</creatorcontrib><creatorcontrib>Zhao, Shiqi</creatorcontrib><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</description><subject>Feature extraction</subject><subject>Matching</subject><subject>Modules</subject><subject>Object tracking</subject><subject>Pose estimation</subject><subject>pose tracking</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Tracking networks</subject><subject>transformer</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1LAzEQxYMoWKt3Dx4CnrdOssl-eJP6CS0VWr2GNDspW7cbTbaF_vdmbQ9eZh6P35uBR8g1gxFjUN4tptMRB56OUl4IWRYnZMBKwRKAPD-NWnJISs7gnFyEsAZgQkI-INVYdwuvzdc9ndftqsFk3ukV0mjjyvl9MsEdNjR7pLPlGk1H311A-peION3Vmo5du3PNtqtdS3Vb0c869DIybbDOb9BfkjOrm4BXxz0kH89Pi_FrMpm9vI0fJonhQnaJyAwIQF4aazlnBu1SclZJscQsyyvNZG4yYwVKqITobShsJizyVKJmPB2S28Pdb-9-thg6tXZb38aXipesgDhSiBQcKONdCB6t-vb1Rvu9YqD6LlXsUvVdqmOXMXJziNSI-A9nopQpT38B0bBv1Q</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Yu, Sheng</creator><creator>Zhai, Di-Hua</creator><creator>Xia, Yuanqing</creator><creator>Li, Dong</creator><creator>Zhao, Shiqi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8653-8626</orcidid><orcidid>https://orcid.org/0000-0002-5977-4911</orcidid><orcidid>https://orcid.org/0009-0004-6523-6900</orcidid><orcidid>https://orcid.org/0009-0002-0709-1024</orcidid></search><sort><creationdate>20240101</creationdate><title>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</title><author>Yu, Sheng ; Zhai, Di-Hua ; Xia, Yuanqing ; Li, Dong ; Zhao, Shiqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-46c040e29cff221cefb521d54be667da157c6cf4e50d4454be08f64fe235ea123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Feature extraction</topic><topic>Matching</topic><topic>Modules</topic><topic>Object tracking</topic><topic>Pose estimation</topic><topic>pose tracking</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Tracking networks</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yu, Sheng</creatorcontrib><creatorcontrib>Zhai, Di-Hua</creatorcontrib><creatorcontrib>Xia, Yuanqing</creatorcontrib><creatorcontrib>Li, Dong</creatorcontrib><creatorcontrib>Zhao, Shiqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Sheng</au><au>Zhai, Di-Hua</au><au>Xia, Yuanqing</au><au>Li, Dong</au><au>Zhao, Shiqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>26</volume><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this paper, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects. A video is available at https://youtu.be/Yminjdtsgwk .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3284598</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-8653-8626</orcidid><orcidid>https://orcid.org/0000-0002-5977-4911</orcidid><orcidid>https://orcid.org/0009-0004-6523-6900</orcidid><orcidid>https://orcid.org/0009-0002-0709-1024</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2024-01, Vol.26, p.1-16 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_proquest_journals_2918029130 |
source | IEEE Electronic Library (IEL) |
subjects | Feature extraction Matching Modules Object tracking Pose estimation pose tracking Target tracking Task analysis Three dimensional models Three-dimensional displays Tracking networks transformer Transformers |
title | CatTrack: Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A49%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CatTrack:%20Single-Stage%20Category-Level%206D%20Object%20Pose%20Tracking%20via%20Convolution%20and%20Vision%20Transformer&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Yu,%20Sheng&rft.date=2024-01-01&rft.volume=26&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3284598&rft_dat=%3Cproquest_RIE%3E2918029130%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918029130&rft_id=info:pmid/&rft_ieee_id=10149532&rfr_iscdi=true |