InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects

Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters 2024-11, Vol.9 (11), p.10708-10715
Hauptverfasser: Xie, Yuanyan, Yang, Junzhe, Zhou, Huaidong, Sun, Fuchun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10715
container_issue 11
container_start_page 10708
container_title IEEE robotics and automation letters
container_volume 9
creator Xie, Yuanyan
Yang, Junzhe
Zhou, Huaidong
Sun, Fuchun
description Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.
doi_str_mv 10.1109/LRA.2024.3477292
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3118090677</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10711206</ieee_id><sourcerecordid>3118090677</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</originalsourceid><addsrcrecordid>eNpNkM1Lw0AQxYMoWGrvHjwseE7dj2w2660UrYVIxdpew2YzkS3tJu4mQs_-425thZ5mHvPeDPOLoluCx4Rg-ZC_T8YU02TMEiGopBfRgDIhYibS9PKsv45G3m8wxoRTwSQfRD9z6ztlNawXj2gJ2zpe9i24b-OhCnqnbGc0Whvfqy1aVM0OOrdH5R6tvLGf6DXIMM9BOXvQXYPmVjeubZzqAM3gL2B0CL850ziPjEX_J9Gi3IDu_E10Vauth9GpDqPV89PH9CXOF7P5dJLHmgjexWkqlcpoxsuMhBdqTlQpOdFCK8FqyTnVktdVJUiaUCykBpJomVGqoWZ1ytkwuj_ubV3z1YPvik3TOxtOFoyQDEucChFc-OjSrvHeQV20zuyU2xcEFwfaRaBdHGgXJ9ohcneMGAA4swtCKE7ZLxcGe-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3118090677</pqid></control><display><type>article</type><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><source>IEEE Electronic Library (IEL)</source><creator>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</creator><creatorcontrib>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</creatorcontrib><description>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3477292</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Computer architecture ; Depth prediction ; ego-motion estimation ; Feature extraction ; Image reconstruction ; Image segmentation ; Instance segmentation ; Learning ; Measurement ; Modules ; Motion simulation ; Odometry ; Pose estimation ; Robustness ; self-supervised learning ; semantic understanding ; Semantics ; Task complexity ; Unmanned ground vehicles ; Visual odometry ; Visual tasks</subject><ispartof>IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10708-10715</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</cites><orcidid>0000-0002-7106-6555 ; 0000-0003-3546-6305 ; 0000-0002-0063-9782</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10711206$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10711206$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xie, Yuanyan</creatorcontrib><creatorcontrib>Yang, Junzhe</creatorcontrib><creatorcontrib>Zhou, Huaidong</creatorcontrib><creatorcontrib>Sun, Fuchun</creatorcontrib><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</description><subject>Computer architecture</subject><subject>Depth prediction</subject><subject>ego-motion estimation</subject><subject>Feature extraction</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Instance segmentation</subject><subject>Learning</subject><subject>Measurement</subject><subject>Modules</subject><subject>Motion simulation</subject><subject>Odometry</subject><subject>Pose estimation</subject><subject>Robustness</subject><subject>self-supervised learning</subject><subject>semantic understanding</subject><subject>Semantics</subject><subject>Task complexity</subject><subject>Unmanned ground vehicles</subject><subject>Visual odometry</subject><subject>Visual tasks</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1Lw0AQxYMoWGrvHjwseE7dj2w2660UrYVIxdpew2YzkS3tJu4mQs_-425thZ5mHvPeDPOLoluCx4Rg-ZC_T8YU02TMEiGopBfRgDIhYibS9PKsv45G3m8wxoRTwSQfRD9z6ztlNawXj2gJ2zpe9i24b-OhCnqnbGc0Whvfqy1aVM0OOrdH5R6tvLGf6DXIMM9BOXvQXYPmVjeubZzqAM3gL2B0CL850ziPjEX_J9Gi3IDu_E10Vauth9GpDqPV89PH9CXOF7P5dJLHmgjexWkqlcpoxsuMhBdqTlQpOdFCK8FqyTnVktdVJUiaUCykBpJomVGqoWZ1ytkwuj_ubV3z1YPvik3TOxtOFoyQDEucChFc-OjSrvHeQV20zuyU2xcEFwfaRaBdHGgXJ9ohcneMGAA4swtCKE7ZLxcGe-g</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Xie, Yuanyan</creator><creator>Yang, Junzhe</creator><creator>Zhou, Huaidong</creator><creator>Sun, Fuchun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7106-6555</orcidid><orcidid>https://orcid.org/0000-0003-3546-6305</orcidid><orcidid>https://orcid.org/0000-0002-0063-9782</orcidid></search><sort><creationdate>20241101</creationdate><title>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</title><author>Xie, Yuanyan ; Yang, Junzhe ; Zhou, Huaidong ; Sun, Fuchun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-669aa8285b81000f51ab951c7ca73f9552c95fdd71642079ce14c9822cef3f653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer architecture</topic><topic>Depth prediction</topic><topic>ego-motion estimation</topic><topic>Feature extraction</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Instance segmentation</topic><topic>Learning</topic><topic>Measurement</topic><topic>Modules</topic><topic>Motion simulation</topic><topic>Odometry</topic><topic>Pose estimation</topic><topic>Robustness</topic><topic>self-supervised learning</topic><topic>semantic understanding</topic><topic>Semantics</topic><topic>Task complexity</topic><topic>Unmanned ground vehicles</topic><topic>Visual odometry</topic><topic>Visual tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xie, Yuanyan</creatorcontrib><creatorcontrib>Yang, Junzhe</creatorcontrib><creatorcontrib>Zhou, Huaidong</creatorcontrib><creatorcontrib>Sun, Fuchun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Yuanyan</au><au>Yang, Junzhe</au><au>Zhou, Huaidong</au><au>Sun, Fuchun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>9</volume><issue>11</issue><spage>10708</spage><epage>10715</epage><pages>10708-10715</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Visual odometry is one of the key technologies for unmanned ground vehicles. To improve the robustness of the systems and enable intelligent tasks, researchers introduced learning-based recognition modules into visual odometry systems, but didn't realize tight coupling between visual odometry systems and recognition modules. This letter proposes a self-supervised semantic visual odometry method, which can complete the tasks of ego-motion estimation, depth prediction, and instance segmentation with a shared encoder. The potential dynamic regions are removed and the image reconstruction loss is rectified by instance detection results. Moreover, the instance-guided triplet loss and cross-task self-attention modules are devised to learn the geometrical relationships among pixels that are implied in instance object priors. The proposed method is validated on KITTI and ComplexUrban datasets. The experimental results show that our method has superiority to baseline models in both pose estimation and depth prediction. We also discuss the efficacy of evaluation metrics for pose estimation, and consider the accumulation errors of trajectories.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3477292</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-7106-6555</orcidid><orcidid>https://orcid.org/0000-0003-3546-6305</orcidid><orcidid>https://orcid.org/0000-0002-0063-9782</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2377-3766
ispartof IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10708-10715
issn 2377-3766
2377-3766
language eng
recordid cdi_proquest_journals_3118090677
source IEEE Electronic Library (IEL)
subjects Computer architecture
Depth prediction
ego-motion estimation
Feature extraction
Image reconstruction
Image segmentation
Instance segmentation
Learning
Measurement
Modules
Motion simulation
Odometry
Pose estimation
Robustness
self-supervised learning
semantic understanding
Semantics
Task complexity
Unmanned ground vehicles
Visual odometry
Visual tasks
title InstanceVO: Self-Supervised Semantic Visual Odometry by Using Metric Learning to Incorporate Geometrical Priors in Instance Objects
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A50%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=InstanceVO:%20Self-Supervised%20Semantic%20Visual%20Odometry%20by%20Using%20Metric%20Learning%20to%20Incorporate%20Geometrical%20Priors%20in%20Instance%20Objects&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Xie,%20Yuanyan&rft.date=2024-11-01&rft.volume=9&rft.issue=11&rft.spage=10708&rft.epage=10715&rft.pages=10708-10715&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3477292&rft_dat=%3Cproquest_RIE%3E3118090677%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3118090677&rft_id=info:pmid/&rft_ieee_id=10711206&rfr_iscdi=true