An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation

Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications 2024, Vol.36 (3), p.1261-1281
Hauptverfasser:	Choate, Jeffrey, Worth, Derek, Nykl, Scott, Taylor, Clark, Borghetti, Brett, Schubert Kabban, Christine
Format:	Artikel
Sprache:	eng
Schlagworte:	Aircraft Algorithms Artificial Intelligence Artificial neural networks Color imagery Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data augmentation Data Mining and Knowledge Discovery Image Processing and Computer Vision Labeling Object recognition Occlusion Original Article Pinhole cameras Pixels Pose estimation Probability and Statistics in Computer Science Robotics Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1281
container_issue	3
container_start_page	1261
container_title	Neural computing & applications
container_volume	36
creator	Choate, Jeffrey Worth, Derek Nykl, Scott Taylor, Clark Borghetti, Brett Schubert Kabban, Christine
description	Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional neural network (CNN) for 6D pose localization of aircraft using only color imagery. Traditional object detection labeling methods suffer from inaccuracies due to perspective geometry and being limited to visible key points. This research demonstrates that with precise labeling, a CNN can predict object features with near-pixel accuracy, effectively learning the distinct appearance of the object due to perspective distortion with a pinhole camera. Additionally, we highlight the crucial role of knowledge about occluded features. Training the CNN with such knowledge slightly reduces pixel precision, but enables the prediction of 3 times more features, including those that are not initially visible, resulting in an overall better performing 6D system. Notably, we reveal that the data augmentation technique of scale can interfere with pixel precision when used during training. These findings are crucial for the entire system, which leverages the Solve Perspective-N-Point (Solve-PnP) algorithm, achieving 6D pose accuracy within 1 ∘ and 7 cm at distances ranging from 7.5 to 35 m from the camera. Moreover, this solution operates in real-time, achieving sub-10ms processing times on a desktop PC.
doi_str_mv	10.1007/s00521-023-09094-8
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2910703160</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2910703160</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-ecd69d9722597af7dfa58aa11d9a0addc10fdbf7d33e0bb1e6820da4c49d8efd3</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIlMIPcLLEObCO8zK3qjylSlzgxMFy7U2VKo2DN0Xqjd_g9_gSXILEjdOuZmdmd4excwGXAqC8IoA8FQmkMgEFKkuqAzYRmZSJhLw6ZJOIxXGRyWN2QrQGgKyo8gl7nXXcdKbdUUPc17wPaBtqfHfNvbXtdt9GguM9BurRDs078hX6DQ5h9_XxSTz4FnnT8eKG956QIw3NxgxRd8qOatMSnv3WKXu5u32ePySLp_vH-WyRWCmyIUHrCuVUmaa5Kk1dutrklTFCOGXAOGcF1G4ZcSkRlkuBRZWCM5nNlKuwdnLKLkbfPvi3bdyv134b4lOkUyWgBCkKiKx0ZNngiQLWug_x0LDTAvQ-RD2GqGOI-idEXUWRHEUUyd0Kw5_1P6pv-bN3dg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2910703160</pqid></control><display><type>article</type><title>An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation</title><source>Springer Nature - Complete Springer Journals</source><creator>Choate, Jeffrey ; Worth, Derek ; Nykl, Scott ; Taylor, Clark ; Borghetti, Brett ; Schubert Kabban, Christine</creator><creatorcontrib>Choate, Jeffrey ; Worth, Derek ; Nykl, Scott ; Taylor, Clark ; Borghetti, Brett ; Schubert Kabban, Christine</creatorcontrib><description>Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional neural network (CNN) for 6D pose localization of aircraft using only color imagery. Traditional object detection labeling methods suffer from inaccuracies due to perspective geometry and being limited to visible key points. This research demonstrates that with precise labeling, a CNN can predict object features with near-pixel accuracy, effectively learning the distinct appearance of the object due to perspective distortion with a pinhole camera. Additionally, we highlight the crucial role of knowledge about occluded features. Training the CNN with such knowledge slightly reduces pixel precision, but enables the prediction of 3 times more features, including those that are not initially visible, resulting in an overall better performing 6D system. Notably, we reveal that the data augmentation technique of scale can interfere with pixel precision when used during training. These findings are crucial for the entire system, which leverages the Solve Perspective-N-Point (Solve-PnP) algorithm, achieving 6D pose accuracy within 1 ∘ and 7 cm at distances ranging from 7.5 to 35 m from the camera. Moreover, this solution operates in real-time, achieving sub-10ms processing times on a desktop PC.</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-023-09094-8</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Aircraft ; Algorithms ; Artificial Intelligence ; Artificial neural networks ; Color imagery ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Data augmentation ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Labeling ; Object recognition ; Occlusion ; Original Article ; Pinhole cameras ; Pixels ; Pose estimation ; Probability and Statistics in Computer Science ; Robotics ; Training</subject><ispartof>Neural computing & applications, 2024, Vol.36 (3), p.1261-1281</ispartof><rights>This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2023</rights><rights>This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c314t-ecd69d9722597af7dfa58aa11d9a0addc10fdbf7d33e0bb1e6820da4c49d8efd3</cites><orcidid>0000-0002-6417-3715 ; 0000-0002-9635-6022 ; 0000-0002-4271-2132 ; 0000-0001-5649-3092 ; 0000-0003-4982-9859 ; 0000-0001-9778-0813</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00521-023-09094-8$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00521-023-09094-8$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Choate, Jeffrey</creatorcontrib><creatorcontrib>Worth, Derek</creatorcontrib><creatorcontrib>Nykl, Scott</creatorcontrib><creatorcontrib>Taylor, Clark</creatorcontrib><creatorcontrib>Borghetti, Brett</creatorcontrib><creatorcontrib>Schubert Kabban, Christine</creatorcontrib><title>An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation</title><title>Neural computing & applications</title><addtitle>Neural Comput & Applic</addtitle><description>Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional neural network (CNN) for 6D pose localization of aircraft using only color imagery. Traditional object detection labeling methods suffer from inaccuracies due to perspective geometry and being limited to visible key points. This research demonstrates that with precise labeling, a CNN can predict object features with near-pixel accuracy, effectively learning the distinct appearance of the object due to perspective distortion with a pinhole camera. Additionally, we highlight the crucial role of knowledge about occluded features. Training the CNN with such knowledge slightly reduces pixel precision, but enables the prediction of 3 times more features, including those that are not initially visible, resulting in an overall better performing 6D system. Notably, we reveal that the data augmentation technique of scale can interfere with pixel precision when used during training. These findings are crucial for the entire system, which leverages the Solve Perspective-N-Point (Solve-PnP) algorithm, achieving 6D pose accuracy within 1 ∘ and 7 cm at distances ranging from 7.5 to 35 m from the camera. Moreover, this solution operates in real-time, achieving sub-10ms processing times on a desktop PC.</description><subject>Aircraft</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Color imagery</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Data augmentation</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Labeling</subject><subject>Object recognition</subject><subject>Occlusion</subject><subject>Original Article</subject><subject>Pinhole cameras</subject><subject>Pixels</subject><subject>Pose estimation</subject><subject>Probability and Statistics in Computer Science</subject><subject>Robotics</subject><subject>Training</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNp9UMtOwzAQtBBIlMIPcLLEObCO8zK3qjylSlzgxMFy7U2VKo2DN0Xqjd_g9_gSXILEjdOuZmdmd4excwGXAqC8IoA8FQmkMgEFKkuqAzYRmZSJhLw6ZJOIxXGRyWN2QrQGgKyo8gl7nXXcdKbdUUPc17wPaBtqfHfNvbXtdt9GguM9BurRDs078hX6DQ5h9_XxSTz4FnnT8eKG956QIw3NxgxRd8qOatMSnv3WKXu5u32ePySLp_vH-WyRWCmyIUHrCuVUmaa5Kk1dutrklTFCOGXAOGcF1G4ZcSkRlkuBRZWCM5nNlKuwdnLKLkbfPvi3bdyv134b4lOkUyWgBCkKiKx0ZNngiQLWug_x0LDTAvQ-RD2GqGOI-idEXUWRHEUUyd0Kw5_1P6pv-bN3dg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Choate, Jeffrey</creator><creator>Worth, Derek</creator><creator>Nykl, Scott</creator><creator>Taylor, Clark</creator><creator>Borghetti, Brett</creator><creator>Schubert Kabban, Christine</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-6417-3715</orcidid><orcidid>https://orcid.org/0000-0002-9635-6022</orcidid><orcidid>https://orcid.org/0000-0002-4271-2132</orcidid><orcidid>https://orcid.org/0000-0001-5649-3092</orcidid><orcidid>https://orcid.org/0000-0003-4982-9859</orcidid><orcidid>https://orcid.org/0000-0001-9778-0813</orcidid></search><sort><creationdate>2024</creationdate><title>An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation</title><author>Choate, Jeffrey ; Worth, Derek ; Nykl, Scott ; Taylor, Clark ; Borghetti, Brett ; Schubert Kabban, Christine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-ecd69d9722597af7dfa58aa11d9a0addc10fdbf7d33e0bb1e6820da4c49d8efd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Aircraft</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Color imagery</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Data augmentation</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Labeling</topic><topic>Object recognition</topic><topic>Occlusion</topic><topic>Original Article</topic><topic>Pinhole cameras</topic><topic>Pixels</topic><topic>Pose estimation</topic><topic>Probability and Statistics in Computer Science</topic><topic>Robotics</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Choate, Jeffrey</creatorcontrib><creatorcontrib>Worth, Derek</creatorcontrib><creatorcontrib>Nykl, Scott</creatorcontrib><creatorcontrib>Taylor, Clark</creatorcontrib><creatorcontrib>Borghetti, Brett</creatorcontrib><creatorcontrib>Schubert Kabban, Christine</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Neural computing & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Choate, Jeffrey</au><au>Worth, Derek</au><au>Nykl, Scott</au><au>Taylor, Clark</au><au>Borghetti, Brett</au><au>Schubert Kabban, Christine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation</atitle><jtitle>Neural computing & applications</jtitle><stitle>Neural Comput & Applic</stitle><date>2024</date><risdate>2024</risdate><volume>36</volume><issue>3</issue><spage>1261</spage><epage>1281</epage><pages>1261-1281</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional neural network (CNN) for 6D pose localization of aircraft using only color imagery. Traditional object detection labeling methods suffer from inaccuracies due to perspective geometry and being limited to visible key points. This research demonstrates that with precise labeling, a CNN can predict object features with near-pixel accuracy, effectively learning the distinct appearance of the object due to perspective distortion with a pinhole camera. Additionally, we highlight the crucial role of knowledge about occluded features. Training the CNN with such knowledge slightly reduces pixel precision, but enables the prediction of 3 times more features, including those that are not initially visible, resulting in an overall better performing 6D system. Notably, we reveal that the data augmentation technique of scale can interfere with pixel precision when used during training. These findings are crucial for the entire system, which leverages the Solve Perspective-N-Point (Solve-PnP) algorithm, achieving 6D pose accuracy within 1 ∘ and 7 cm at distances ranging from 7.5 to 35 m from the camera. Moreover, this solution operates in real-time, achieving sub-10ms processing times on a desktop PC.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-023-09094-8</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-6417-3715</orcidid><orcidid>https://orcid.org/0000-0002-9635-6022</orcidid><orcidid>https://orcid.org/0000-0002-4271-2132</orcidid><orcidid>https://orcid.org/0000-0001-5649-3092</orcidid><orcidid>https://orcid.org/0000-0003-4982-9859</orcidid><orcidid>https://orcid.org/0000-0001-9778-0813</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0941-0643
ispartof	Neural computing & applications, 2024, Vol.36 (3), p.1261-1281
issn	0941-0643 1433-3058
language	eng
recordid	cdi_proquest_journals_2910703160
source	Springer Nature - Complete Springer Journals
subjects	Aircraft Algorithms Artificial Intelligence Artificial neural networks Color imagery Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data augmentation Data Mining and Knowledge Discovery Image Processing and Computer Vision Labeling Object recognition Occlusion Original Article Pinhole cameras Pixels Pose estimation Probability and Statistics in Computer Science Robotics Training
title	An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T22%3A01%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20analysis%20of%20precision:%20occlusion%20and%20perspective%20geometry%E2%80%99s%20role%20in%206D%20pose%20estimation&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Choate,%20Jeffrey&rft.date=2024&rft.volume=36&rft.issue=3&rft.spage=1261&rft.epage=1281&rft.pages=1261-1281&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-023-09094-8&rft_dat=%3Cproquest_cross%3E2910703160%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2910703160&rft_id=info:pmid/&rfr_iscdi=true