GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhou, Rui, Liu, Jingbin, Xie, Junbin, Zhang, Jianyu, Hu, Yingze, Zhao, Jiele
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhou, Rui Liu, Jingbin Xie, Junbin Zhang, Jianyu Hu, Yingze Zhao, Jiele
description	Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.
doi_str_mv	10.48550/arxiv.2411.19289
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_19289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_19289</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_192893</originalsourceid><addsrcrecordid>eNqFjj1uwkAQRrdJEREOkIq5gB2bHwlogwkuHCIZaK3BHjuLdmfRehzFJ-DaEJI-1Sc9fXp6Sj3HUTidz2bRC_pv_RWOp3EcxovxfPGoLm9ZHhzS93yZdUZ0UKJQ43wPq57R6hK2xxOV0kJOFlluIKfGEguKdgy185DwJ3JJFRx026EJUiYvGg1sK2dJbq59q7kBhA_v7FnwaAjWruPq15G5isyTeqjRtDT824EarZPd6ya4Jxdnry36vvhJL-7pk_8fV2AhUK0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><source>arXiv.org</source><creator>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</creator><creatorcontrib>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</creatorcontrib><description>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</description><identifier>DOI: 10.48550/arxiv.2411.19289</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.19289$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.19289$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Rui</creatorcontrib><creatorcontrib>Liu, Jingbin</creatorcontrib><creatorcontrib>Xie, Junbin</creatorcontrib><creatorcontrib>Zhang, Jianyu</creatorcontrib><creatorcontrib>Hu, Yingze</creatorcontrib><creatorcontrib>Zhao, Jiele</creatorcontrib><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><description>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjj1uwkAQRrdJEREOkIq5gB2bHwlogwkuHCIZaK3BHjuLdmfRehzFJ-DaEJI-1Sc9fXp6Sj3HUTidz2bRC_pv_RWOp3EcxovxfPGoLm9ZHhzS93yZdUZ0UKJQ43wPq57R6hK2xxOV0kJOFlluIKfGEguKdgy185DwJ3JJFRx026EJUiYvGg1sK2dJbq59q7kBhA_v7FnwaAjWruPq15G5isyTeqjRtDT824EarZPd6ya4Jxdnry36vvhJL-7pk_8fV2AhUK0</recordid><startdate>20241128</startdate><enddate>20241128</enddate><creator>Zhou, Rui</creator><creator>Liu, Jingbin</creator><creator>Xie, Junbin</creator><creator>Zhang, Jianyu</creator><creator>Hu, Yingze</creator><creator>Zhao, Jiele</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241128</creationdate><title>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</title><author>Zhou, Rui ; Liu, Jingbin ; Xie, Junbin ; Zhang, Jianyu ; Hu, Yingze ; Zhao, Jiele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_192893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Rui</creatorcontrib><creatorcontrib>Liu, Jingbin</creatorcontrib><creatorcontrib>Xie, Junbin</creatorcontrib><creatorcontrib>Zhang, Jianyu</creatorcontrib><creatorcontrib>Hu, Yingze</creatorcontrib><creatorcontrib>Zhao, Jiele</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Rui</au><au>Liu, Jingbin</au><au>Xie, Junbin</au><au>Zhang, Jianyu</au><au>Hu, Yingze</au><au>Zhao, Jiele</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model</atitle><date>2024-11-28</date><risdate>2024</risdate><abstract>Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.</abstract><doi>10.48550/arxiv.2411.19289</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.19289
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_19289
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T05%3A28%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GMS-VINS:Multi-category%20Dynamic%20Objects%20Semantic%20Segmentation%20for%20Enhanced%20Visual-Inertial%20Odometry%20Using%20a%20Promptable%20Foundation%20Model&rft.au=Zhou,%20Rui&rft.date=2024-11-28&rft_id=info:doi/10.48550/arxiv.2411.19289&rft_dat=%3Carxiv_GOX%3E2411_19289%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true