InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects

Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-11, Vol.9 (11), p.10708-10715
Hauptverfasser:	Xie, Yuanyan, Yang, Junzhe, Zhou, Huaidong, Sun, Fuchun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer architecture Depth prediction ego-motion estimation Feature extraction Image reconstruction Image segmentation Instance segmentation Learning Measurement Modules Motion simulation Odometry Pose estimation Robustness self-supervised learning semantic understanding Semantics Task complexity Unmanned ground vehicles Visual odometry Visual tasks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	10715
container_issue	11
container_start_page	10708
container_title	IEEE robotics and automation letters
container_volume	9
creator	Xie, Yuanyan Yang, Junzhe Zhou, Huaidong Sun, Fuchun
description	Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.
doi_str_mv	10.1109/LRA.2024.3477292
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3118090677</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10711206</ieee_id><sourcerecordid>3118090677</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</originalsourceid><addsrcrecordid>eNpNkM1Lw0AQxYMoWGrvHjwseE7dj2w2660UrYVIxdpew2YzkS3tJu4mQs_-425thZ5mHvPeDPOLoluCx4Rg-ZC_T8YU02TMEiGopBfRgDIhYibS9PKsv45G3m8wxoRTwSQfRD9z6ztlNawXj2gJ2zpe9i24b-OhCnqnbGc0Whvfqy1aVM0OOrdH5R6tvLGf6DXIMM9BOXvQXYPmVjeubZzqAM3gL2B0CL850ziPjEX_J9Gi3IDu_E10Vauth9GpDqPV89PH9CXOF7P5dJLHmgjexWkqlcpoxsuMhBdqTlQpOdFCK8FqyTnVktdVJUiaUCykBpJomVGqoWZ1ytkwuj_ubV3z1YPvik3TOxtOFoyQDEucChFc-OjSrvHeQV20zuyU2xcEFwfaRaBdHGgXJ9ohcneMGAA4swtCKE7ZLxcGe-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3118090677</pqid></control><display><type>article</type><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><source>IEEE Electronic Library (IEL)</source><creator>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</creator><creatorcontrib>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</creatorcontrib><description>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3477292</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Computer architecture ; Depth prediction ; ego-motion estimation ; Feature extraction ; Image reconstruction ; Image segmentation ; Instance segmentation ; Learning ; Measurement ; Modules ; Motion simulation ; Odometry ; Pose estimation ; Robustness ; self-supervised learning ; semantic understanding ; Semantics ; Task complexity ; Unmanned ground vehicles ; Visual odometry ; Visual tasks</subject><ispartof>IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10708-10715</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</cites><orcidid>0000-0002-7106-6555 ; 0000-0003-3546-6305 ; 0000-0002-0063-9782</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10711206$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10711206$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xie, Yuanyan</creatorcontrib><creatorcontrib>Yang, Junzhe</creatorcontrib><creatorcontrib>Zhou, Huaidong</creatorcontrib><creatorcontrib>Sun, Fuchun</creatorcontrib><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</description><subject>Computer architecture</subject><subject>Depth prediction</subject><subject>ego-motion estimation</subject><subject>Feature extraction</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Instance segmentation</subject><subject>Learning</subject><subject>Measurement</subject><subject>Modules</subject><subject>Motion simulation</subject><subject>Odometry</subject><subject>Pose estimation</subject><subject>Robustness</subject><subject>self-supervised learning</subject><subject>semantic understanding</subject><subject>Semantics</subject><subject>Task complexity</subject><subject>Unmanned ground vehicles</subject><subject>Visual odometry</subject><subject>Visual tasks</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1Lw0AQxYMoWGrvHjwseE7dj2w2660UrYVIxdpew2YzkS3tJu4mQs_-425thZ5mHvPeDPOLoluCx4Rg-ZC_T8YU02TMEiGopBfRgDIhYibS9PKsv45G3m8wxoRTwSQfRD9z6ztlNawXj2gJ2zpe9i24b-OhCnqnbGc0Whvfqy1aVM0OOrdH5R6tvLGf6DXIMM9BOXvQXYPmVjeubZzqAM3gL2B0CL850ziPjEX_J9Gi3IDu_E10Vauth9GpDqPV89PH9CXOF7P5dJLHmgjexWkqlcpoxsuMhBdqTlQpOdFCK8FqyTnVktdVJUiaUCykBpJomVGqoWZ1ytkwuj_ubV3z1YPvik3TOxtOFoyQDEucChFc-OjSrvHeQV20zuyU2xcEFwfaRaBdHGgXJ9ohcneMGAA4swtCKE7ZLxcGe-g</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Xie, Yuanyan</creator><creator>Yang, Junzhe</creator><creator>Zhou, Huaidong</creator><creator>Sun, Fuchun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7106-6555</orcidid><orcidid>https://orcid.org/0000-0003-3546-6305</orcidid><orcidid>https://orcid.org/0000-0002-0063-9782</orcidid></search><sort><creationdate>20241101</creationdate><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><author>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer architecture</topic><topic>Depth prediction</topic><topic>ego-motion estimation</topic><topic>Feature extraction</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Instance segmentation</topic><topic>Learning</topic><topic>Measurement</topic><topic>Modules</topic><topic>Motion simulation</topic><topic>Odometry</topic><topic>Pose estimation</topic><topic>Robustness</topic><topic>self-supervised learning</topic><topic>semantic understanding</topic><topic>Semantics</topic><topic>Task complexity</topic><topic>Unmanned ground vehicles</topic><topic>Visual odometry</topic><topic>Visual tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xie, Yuanyan</creatorcontrib><creatorcontrib>Yang, Junzhe</creatorcontrib><creatorcontrib>Zhou, Huaidong</creatorcontrib><creatorcontrib>Sun, Fuchun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Yuanyan</au><au>Yang, Junzhe</au><au>Zhou, Huaidong</au><au>Sun, Fuchun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>9</volume><issue>11</issue><spage>10708</spage><epage>10715</epage><pages>10708-10715</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3477292</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-7106-6555</orcidid><orcidid>https://orcid.org/0000-0003-3546-6305</orcidid><orcidid>https://orcid.org/0000-0002-0063-9782</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2377-3766
ispartof	IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10708-10715
issn	2377-3766 2377-3766
language	eng
recordid	cdi_proquest_journals_3118090677
source	IEEE Electronic Library (IEL)
subjects	Computer architecture Depth prediction ego-motion estimation Feature extraction Image reconstruction Image segmentation Instance segmentation Learning Measurement Modules Motion simulation Odometry Pose estimation Robustness self-supervised learning semantic understanding Semantics Task complexity Unmanned ground vehicles Visual odometry Visual tasks
title	InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A50%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=InstanceVO:%20Self-Supervised%20Semantic%20Visual%20Odometry%20by%20Using%20Metric%20Learning%20to%20Incorporate%20Geometrical%20Priors%20in%20Instance%20Objects&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Xie,%20Yuanyan&rft.date=2024-11-01&rft.volume=9&rft.issue=11&rft.spage=10708&rft.epage=10715&rft.pages=10708-10715&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3477292&rft_dat=%3Cproquest_RIE%3E3118090677%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3118090677&rft_id=info:pmid/&rft_ieee_id=10711206&rfr_iscdi=true