6D Object Pose Estimation with Compact Generalized Non-local Operation

Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024-11, Vol.12, p.1-1
Hauptverfasser:	Jiang, Changhong, Mu, Xiaoqiao, Zhang, Bingbing, Liang, Chao, Xie, Mujun
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Computational modeling Correlation Correlations End-to-end Feature extraction Fine-grained Details Long-range Spatiotemporal Pose estimation Predictive models Representational Power Solid modeling Subtle Feature Three-dimensional displays Training YOLO
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume	12
creator	Jiang, Changhong Mu, Xiaoqiao Zhang, Bingbing Liang, Chao Xie, Mujun
description	Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.
doi_str_mv	10.1109/ACCESS.2024.3508772
format	Article
fullrecord	<record><control><sourceid>doaj_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10771728</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10771728</ieee_id><doaj_id>oai_doaj_org_article_4abda70ec3204d2987fb4961b9427e16</doaj_id><sourcerecordid>oai_doaj_org_article_4abda70ec3204d2987fb4961b9427e16</sourcerecordid><originalsourceid>FETCH-LOGICAL-d1322-5abc12c2a2cef010e0d4980bc7705e317aaaac257a0d1014301fb5d6689f328a3</originalsourceid><addsrcrecordid>eNo9jtFKw0AQRRdBsNR-gT7kB1JnZpPs5rHEVgvFCtXnMNmdaEraLUlA9OuNVrwvF84Mh6vUDcIcEfK7RVEsd7s5ASVznYI1hi7UhDDLY53q7ErN-n4PY-yIUjNRq-w-2lZ7cUP0HHqJlv3QHHhowjH6aIb3qAiHE4_HBzlKx23zJT56Cse4DY7baHsa4c_ztbqsue1l9tdT9bpavhSP8Wb7sC4Wm9ijJopTrhySIyYnNSAI-CS3UDljIBWNhsc4Sg2DR8BEA9ZV6rPM5rUmy3qq1mevD7wvT924tfssAzflLwjdW8nd0LhWyoQrzwbEaYLEU25NXSV5hlWekBHMRtft2dWIyL8LwRg0ZPU346th5A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>6D Object Pose Estimation with Compact Generalized Non-local Operation</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>IEEE Xplore Open Access Journals</source><creator>Jiang, Changhong ; Mu, Xiaoqiao ; Zhang, Bingbing ; Liang, Chao ; Xie, Mujun</creator><creatorcontrib>Jiang, Changhong ; Mu, Xiaoqiao ; Zhang, Bingbing ; Liang, Chao ; Xie, Mujun</creatorcontrib><description>Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.</description><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3508772</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Computational modeling ; Correlation ; Correlations ; End-to-end ; Feature extraction ; Fine-grained Details ; Long-range Spatiotemporal ; Pose estimation ; Predictive models ; Representational Power ; Solid modeling ; Subtle Feature ; Three-dimensional displays ; Training ; YOLO</subject><ispartof>IEEE access, 2024-11, Vol.12, p.1-1</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0009-3127-1157 ; 0000-0001-9646-6179 ; 0000-0002-4734-4164 ; 0009-0001-6084-6900 ; 0000-0002-4984-6504</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10771728$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,27638,27929,27930,54938</link.rule.ids></links><search><creatorcontrib>Jiang, Changhong</creatorcontrib><creatorcontrib>Mu, Xiaoqiao</creatorcontrib><creatorcontrib>Zhang, Bingbing</creatorcontrib><creatorcontrib>Liang, Chao</creatorcontrib><creatorcontrib>Xie, Mujun</creatorcontrib><title>6D Object Pose Estimation with Compact Generalized Non-local Operation</title><title>IEEE access</title><addtitle>Access</addtitle><description>Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.</description><subject>Accuracy</subject><subject>Computational modeling</subject><subject>Correlation</subject><subject>Correlations</subject><subject>End-to-end</subject><subject>Feature extraction</subject><subject>Fine-grained Details</subject><subject>Long-range Spatiotemporal</subject><subject>Pose estimation</subject><subject>Predictive models</subject><subject>Representational Power</subject><subject>Solid modeling</subject><subject>Subtle Feature</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>YOLO</subject><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNo9jtFKw0AQRRdBsNR-gT7kB1JnZpPs5rHEVgvFCtXnMNmdaEraLUlA9OuNVrwvF84Mh6vUDcIcEfK7RVEsd7s5ASVznYI1hi7UhDDLY53q7ErN-n4PY-yIUjNRq-w-2lZ7cUP0HHqJlv3QHHhowjH6aIb3qAiHE4_HBzlKx23zJT56Cse4DY7baHsa4c_ztbqsue1l9tdT9bpavhSP8Wb7sC4Wm9ijJopTrhySIyYnNSAI-CS3UDljIBWNhsc4Sg2DR8BEA9ZV6rPM5rUmy3qq1mevD7wvT924tfssAzflLwjdW8nd0LhWyoQrzwbEaYLEU25NXSV5hlWekBHMRtft2dWIyL8LwRg0ZPU346th5A</recordid><startdate>20241128</startdate><enddate>20241128</enddate><creator>Jiang, Changhong</creator><creator>Mu, Xiaoqiao</creator><creator>Zhang, Bingbing</creator><creator>Liang, Chao</creator><creator>Xie, Mujun</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0009-3127-1157</orcidid><orcidid>https://orcid.org/0000-0001-9646-6179</orcidid><orcidid>https://orcid.org/0000-0002-4734-4164</orcidid><orcidid>https://orcid.org/0009-0001-6084-6900</orcidid><orcidid>https://orcid.org/0000-0002-4984-6504</orcidid></search><sort><creationdate>20241128</creationdate><title>6D Object Pose Estimation with Compact Generalized Non-local Operation</title><author>Jiang, Changhong ; Mu, Xiaoqiao ; Zhang, Bingbing ; Liang, Chao ; Xie, Mujun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d1322-5abc12c2a2cef010e0d4980bc7705e317aaaac257a0d1014301fb5d6689f328a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Computational modeling</topic><topic>Correlation</topic><topic>Correlations</topic><topic>End-to-end</topic><topic>Feature extraction</topic><topic>Fine-grained Details</topic><topic>Long-range Spatiotemporal</topic><topic>Pose estimation</topic><topic>Predictive models</topic><topic>Representational Power</topic><topic>Solid modeling</topic><topic>Subtle Feature</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>YOLO</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Changhong</creatorcontrib><creatorcontrib>Mu, Xiaoqiao</creatorcontrib><creatorcontrib>Zhang, Bingbing</creatorcontrib><creatorcontrib>Liang, Chao</creatorcontrib><creatorcontrib>Xie, Mujun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Changhong</au><au>Mu, Xiaoqiao</au><au>Zhang, Bingbing</au><au>Liang, Chao</au><au>Xie, Mujun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>6D Object Pose Estimation with Compact Generalized Non-local Operation</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-11-28</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.</abstract><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3508772</doi><tpages>1</tpages><orcidid>https://orcid.org/0009-0009-3127-1157</orcidid><orcidid>https://orcid.org/0000-0001-9646-6179</orcidid><orcidid>https://orcid.org/0000-0002-4734-4164</orcidid><orcidid>https://orcid.org/0009-0001-6084-6900</orcidid><orcidid>https://orcid.org/0000-0002-4984-6504</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2169-3536
ispartof	IEEE access, 2024-11, Vol.12, p.1-1
issn	2169-3536
language	eng
recordid	cdi_ieee_primary_10771728
source	DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; IEEE Xplore Open Access Journals
subjects	Accuracy Computational modeling Correlation Correlations End-to-end Feature extraction Fine-grained Details Long-range Spatiotemporal Pose estimation Predictive models Representational Power Solid modeling Subtle Feature Three-dimensional displays Training YOLO
title	6D Object Pose Estimation with Compact Generalized Non-local Operation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T07%3A03%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=6D%20Object%20Pose%20Estimation%20with%20Compact%20Generalized%20Non-local%20Operation&rft.jtitle=IEEE%20access&rft.au=Jiang,%20Changhong&rft.date=2024-11-28&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3508772&rft_dat=%3Cdoaj_ieee_%3Eoai_doaj_org_article_4abda70ec3204d2987fb4961b9427e16%3C/doaj_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10771728&rft_doaj_id=oai_doaj_org_article_4abda70ec3204d2987fb4961b9427e16&rfr_iscdi=true