GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhou, Rui, Liu, Jingbin, Xie, Junbin, Zhang, Jianyu, Hu, Yingze, Zhao, Jiele
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhou, Rui
Liu, Jingbin
Xie, Junbin
Zhang, Jianyu
Hu, Yingze
Zhao, Jiele
description Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.
doi_str_mv 10.48550/arxiv.2411.19289
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_19289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_19289</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_192893</originalsourceid><addsrcrecordid>eNqFjj1uwkAQRrdJEREOkIq5gB2bHwlogwkuHCIZaK3BHjuLdmfRehzFJ-DaEJI-1Sc9fXp6Sj3HUTidz2bRC_pv_RWOp3EcxovxfPGoLm9ZHhzS93yZdUZ0UKJQ43wPq57R6hK2xxOV0kJOFlluIKfGEguKdgy185DwJ3JJFRx026EJUiYvGg1sK2dJbq59q7kBhA_v7FnwaAjWruPq15G5isyTeqjRtDT824EarZPd6ya4Jxdnry36vvhJL-7pk_8fV2AhUK0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><source>arXiv.org</source><creator>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</creator><creatorcontrib>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</creatorcontrib><description>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</description><identifier>DOI: 10.48550/arxiv.2411.19289</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.19289$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.19289$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Rui</creatorcontrib><creatorcontrib>Liu, Jingbin</creatorcontrib><creatorcontrib>Xie, Junbin</creatorcontrib><creatorcontrib>Zhang, Jianyu</creatorcontrib><creatorcontrib>Hu, Yingze</creatorcontrib><creatorcontrib>Zhao, Jiele</creatorcontrib><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><description>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjj1uwkAQRrdJEREOkIq5gB2bHwlogwkuHCIZaK3BHjuLdmfRehzFJ-DaEJI-1Sc9fXp6Sj3HUTidz2bRC_pv_RWOp3EcxovxfPGoLm9ZHhzS93yZdUZ0UKJQ43wPq57R6hK2xxOV0kJOFlluIKfGEguKdgy185DwJ3JJFRx026EJUiYvGg1sK2dJbq59q7kBhA_v7FnwaAjWruPq15G5isyTeqjRtDT824EarZPd6ya4Jxdnry36vvhJL-7pk_8fV2AhUK0</recordid><startdate>20241128</startdate><enddate>20241128</enddate><creator>Zhou, Rui</creator><creator>Liu, Jingbin</creator><creator>Xie, Junbin</creator><creator>Zhang, Jianyu</creator><creator>Hu, Yingze</creator><creator>Zhao, Jiele</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241128</creationdate><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><author>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_192893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Rui</creatorcontrib><creatorcontrib>Liu, Jingbin</creatorcontrib><creatorcontrib>Xie, Junbin</creatorcontrib><creatorcontrib>Zhang, Jianyu</creatorcontrib><creatorcontrib>Hu, Yingze</creatorcontrib><creatorcontrib>Zhao, Jiele</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Rui</au><au>Liu, Jingbin</au><au>Xie, Junbin</au><au>Zhang, Jianyu</au><au>Hu, Yingze</au><au>Zhao, Jiele</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</atitle><date>2024-11-28</date><risdate>2024</risdate><abstract>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</abstract><doi>10.48550/arxiv.2411.19289</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.19289
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_19289
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T05%3A28%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GMS-VINS:Multi-category%20Dynamic%20Objects%20Semantic%20Segmentation%20for%20Enhanced%20Visual-Inertial%20Odometry%20Using%20a%20Promptable%20Foundation%20Model&rft.au=Zhou,%20Rui&rft.date=2024-11-28&rft_id=info:doi/10.48550/arxiv.2411.19289&rft_dat=%3Carxiv_GOX%3E2411_19289%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true