GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model
Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects,...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Visual-inertial odometry (VIO) is widely used in various fields, such as
robots, drones, and autonomous vehicles, due to its low cost and complementary
sensors. Most VIO methods presuppose that observed objects are static and
time-invariant. However, real-world scenes often feature dynamic objects,
compromising the accuracy of pose estimation. These moving entities include
cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial
occlusion of these objects present a tough challenge for existing dynamic
object removal techniques. To tackle this challenge, we introduce GMS-VINS,
which integrates an enhanced SORT algorithm along with a robust multi-category
segmentation framework into VIO, thereby improving pose estimation accuracy in
environments with diverse dynamic objects and frequent occlusions. Leveraging
the promptable foundation model, our solution efficiently tracks and segments a
wide range of object categories. The enhanced SORT algorithm significantly
improves the reliability of tracking multiple dynamic objects, especially in
urban settings with partial occlusions or swift movements. We evaluated our
proposed method using multiple public datasets representing various scenes, as
well as in a real-world scenario involving diverse dynamic objects. The
experimental results demonstrate that our proposed method performs impressively
in multiple scenarios, outperforming other state-of-the-art methods. This
highlights its remarkable generalization and adaptability in diverse dynamic
environments, showcasing its potential to handle various dynamic objects in
practical applications. |
---|---|
DOI: | 10.48550/arxiv.2411.19289 |