Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot

Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yu, Justin, Hari, Kush, Srinivas, Kishore, El-Refai, Karim, Rashid, Adam, Kim, Chung Min, Kerr, Justin, Cheng, Richard, Irshad, Muhammad Zubair, Balakrishna, Ashwin, Kollar, Thomas, Goldberg, Ken
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Yu, Justin Hari, Kush Srinivas, Kishore El-Refai, Karim Rashid, Adam Kim, Chung Min Kerr, Justin Cheng, Richard Irshad, Muhammad Zubair Balakrishna, Ashwin Kollar, Thomas Goldberg, Ken
description	Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.
doi_str_mv	10.48550/arxiv.2409.18108
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_18108</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_18108</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_181083</originalsourceid><addsrcrecordid>eNqFzj0PgjAUheEuDkb9AU7eUQcQFBJ01CCa6ALu5AJXbFJa0hY__r3BuDud4bzDw9jU99wgCkNvifrFH-4q8DauH_leNGTqjLLusCYnbgqqKqogwc4YjhKyVqA1MD_HSbbYwkmWmhqSFoV4w67jouKyhlSpxslKFAQptZpMX1iupIEnt3dAuKiC968qlB2zwQ2FoclvR2x2iK_7o_Ol5a3mDep33hPzL3H9v_gASPBHIQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot</title><source>arXiv.org</source><creator>Yu, Justin ; Hari, Kush ; Srinivas, Kishore ; El-Refai, Karim ; Rashid, Adam ; Kim, Chung Min ; Kerr, Justin ; Cheng, Richard ; Irshad, Muhammad Zubair ; Balakrishna, Ashwin ; Kollar, Thomas ; Goldberg, Ken</creator><creatorcontrib>Yu, Justin ; Hari, Kush ; Srinivas, Kishore ; El-Refai, Karim ; Rashid, Adam ; Kim, Chung Min ; Kerr, Justin ; Cheng, Richard ; Irshad, Muhammad Zubair ; Balakrishna, Ashwin ; Kollar, Thomas ; Goldberg, Ken</creatorcontrib><description>Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.</description><identifier>DOI: 10.48550/arxiv.2409.18108</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.18108$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.18108$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Justin</creatorcontrib><creatorcontrib>Hari, Kush</creatorcontrib><creatorcontrib>Srinivas, Kishore</creatorcontrib><creatorcontrib>El-Refai, Karim</creatorcontrib><creatorcontrib>Rashid, Adam</creatorcontrib><creatorcontrib>Kim, Chung Min</creatorcontrib><creatorcontrib>Kerr, Justin</creatorcontrib><creatorcontrib>Cheng, Richard</creatorcontrib><creatorcontrib>Irshad, Muhammad Zubair</creatorcontrib><creatorcontrib>Balakrishna, Ashwin</creatorcontrib><creatorcontrib>Kollar, Thomas</creatorcontrib><creatorcontrib>Goldberg, Ken</creatorcontrib><title>Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot</title><description>Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzj0PgjAUheEuDkb9AU7eUQcQFBJ01CCa6ALu5AJXbFJa0hY__r3BuDud4bzDw9jU99wgCkNvifrFH-4q8DauH_leNGTqjLLusCYnbgqqKqogwc4YjhKyVqA1MD_HSbbYwkmWmhqSFoV4w67jouKyhlSpxslKFAQptZpMX1iupIEnt3dAuKiC968qlB2zwQ2FoclvR2x2iK_7o_Ol5a3mDep33hPzL3H9v_gASPBHIQ</recordid><startdate>20240926</startdate><enddate>20240926</enddate><creator>Yu, Justin</creator><creator>Hari, Kush</creator><creator>Srinivas, Kishore</creator><creator>El-Refai, Karim</creator><creator>Rashid, Adam</creator><creator>Kim, Chung Min</creator><creator>Kerr, Justin</creator><creator>Cheng, Richard</creator><creator>Irshad, Muhammad Zubair</creator><creator>Balakrishna, Ashwin</creator><creator>Kollar, Thomas</creator><creator>Goldberg, Ken</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240926</creationdate><title>Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot</title><author>Yu, Justin ; Hari, Kush ; Srinivas, Kishore ; El-Refai, Karim ; Rashid, Adam ; Kim, Chung Min ; Kerr, Justin ; Cheng, Richard ; Irshad, Muhammad Zubair ; Balakrishna, Ashwin ; Kollar, Thomas ; Goldberg, Ken</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_181083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Justin</creatorcontrib><creatorcontrib>Hari, Kush</creatorcontrib><creatorcontrib>Srinivas, Kishore</creatorcontrib><creatorcontrib>El-Refai, Karim</creatorcontrib><creatorcontrib>Rashid, Adam</creatorcontrib><creatorcontrib>Kim, Chung Min</creatorcontrib><creatorcontrib>Kerr, Justin</creatorcontrib><creatorcontrib>Cheng, Richard</creatorcontrib><creatorcontrib>Irshad, Muhammad Zubair</creatorcontrib><creatorcontrib>Balakrishna, Ashwin</creatorcontrib><creatorcontrib>Kollar, Thomas</creatorcontrib><creatorcontrib>Goldberg, Ken</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Justin</au><au>Hari, Kush</au><au>Srinivas, Kishore</au><au>El-Refai, Karim</au><au>Rashid, Adam</au><au>Kim, Chung Min</au><au>Kerr, Justin</au><au>Cheng, Richard</au><au>Irshad, Muhammad Zubair</au><au>Balakrishna, Ashwin</au><au>Kollar, Thomas</au><au>Goldberg, Ken</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot</atitle><date>2024-09-26</date><risdate>2024</risdate><abstract>Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.</abstract><doi>10.48550/arxiv.2409.18108</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.18108
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_18108
source	arXiv.org
subjects	Computer Science - Robotics
title	Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T12%3A11%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Language-Embedded%20Gaussian%20Splats%20(LEGS):%20Incrementally%20Building%20Room-Scale%20Representations%20with%20a%20Mobile%20Robot&rft.au=Yu,%20Justin&rft.date=2024-09-26&rft_id=info:doi/10.48550/arxiv.2409.18108&rft_dat=%3Carxiv_GOX%3E2409_18108%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true