3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g. objects, walls, rooms), and edges represent relations (e.g. inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a unified representation for actionable spatial perception: 3D
Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent
entities in the scene (e.g. objects, walls, rooms), and edges represent
relations (e.g. inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs)
extend this notion to represent dynamic scenes with moving agents (e.g. humans,
robots), and to include actionable information that supports planning and
decision-making (e.g. spatio-temporal relations, topology at different levels
of abstraction). Our second contribution is to provide the first fully
automatic Spatial PerceptIon eNgine(SPIN) to build a DSG from visual-inertial
data. We integrate state-of-the-art techniques for object and human detection
and pose estimation, and we describe how to robustly infer object, robot, and
human nodes in crowded scenes. To the best of our knowledge, this is the first
paper that reconciles visual-inertial SLAM and dense human mesh tracking.
Moreover, we provide algorithms to obtain hierarchical representations of
indoor environments (e.g. places, structures, rooms) and their relations. Our
third contribution is to demonstrate the proposed spatial perception engine in
a photo-realistic Unity-based simulator, where we assess its robustness and
expressiveness. Finally, we discuss the implications of our proposal on modern
robotics applications. 3D Dynamic Scene Graphs can have a profound impact on
planning and decision-making, human-robot interaction, long-term autonomy, and
scene prediction. A video abstract is available at https://youtu.be/SWbofjhyPzI |
---|---|
DOI: | 10.48550/arxiv.2002.06289 |