Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs
This paper addresses the challenge of scaling Large Multimodal Models (LMMs) to expansive 3D environments. Solving this open problem is especially relevant for robot deployment in many first-responder scenarios, such as search-and-rescue missions that cover vast spaces. The use of LMMs in these sett...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper addresses the challenge of scaling Large Multimodal Models (LMMs)
to expansive 3D environments. Solving this open problem is especially relevant
for robot deployment in many first-responder scenarios, such as
search-and-rescue missions that cover vast spaces. The use of LMMs in these
settings is currently hampered by the strict context windows that limit the
LMM's input size. We therefore introduce a novel approach that utilizes a
datagraph structure, which allows the LMM to iteratively query smaller sections
of a large environment. Using the datagraph in conjunction with graph traversal
algorithms, we can prioritize the most relevant locations to the query, thereby
improving the scalability of 3D scene language tasks. We illustrate the
datagraph using 3D scenes, but these can be easily substituted by other dense
modalities that represent the environment, such as pointclouds or Gaussian
splats. We demonstrate the potential to use the datagraph for two 3D scene
language task use cases, in a search-and-rescue mission example. |
---|---|
DOI: | 10.48550/arxiv.2407.10743 |