Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate clas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-08
Hauptverfasser:	Tri Ton, Ji Woo Hong, Eom, SooHwan, Shim, Jun Yeop, Kim, Junyeong, Yoo, Chang D
Format:	Artikel
Sprache:	eng
Schlagworte:	Image enhancement Image segmentation Instance segmentation Proposals Qualitative analysis Three dimensional models Two dimensional models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Tri Ton Ji Woo Hong Eom, SooHwan Shim, Jun Yeop Kim, Junyeong Yoo, Chang D
description	Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3094563097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3094563097</sourcerecordid><originalsourceid>FETCH-proquest_journals_30945630973</originalsourceid><addsrcrecordid>eNqNi70KwjAURoMgWLTvEHAOxKQ_OluLTgpKB5cSy21rbZN6kyK-vUV8AJfvDOd8E-IJKVdsHQgxI761DedcRLEIQ-mR7Apo2Lk2jiaDatlJuZoetIMKlbsbTVNUHbwMPmhpkB570CwzhboNrcI3lckYW6d0AfQMVQfafW8LMi1Va8H_cU6W6e6y3bMezXMA6_LGDKhHlUu-CcJo3Fj-V30AxPpAzA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3094563097</pqid></control><display><type>article</type><title>Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation</title><source>Free E- Journals</source><creator>Tri Ton ; Ji Woo Hong ; Eom, SooHwan ; Shim, Jun Yeop ; Kim, Junyeong ; Yoo, Chang D</creator><creatorcontrib>Tri Ton ; Ji Woo Hong ; Eom, SooHwan ; Shim, Jun Yeop ; Kim, Junyeong ; Yoo, Chang D</creatorcontrib><description>Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Image enhancement ; Image segmentation ; Instance segmentation ; Proposals ; Qualitative analysis ; Three dimensional models ; Two dimensional models</subject><ispartof>arXiv.org, 2024-08</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Tri Ton</creatorcontrib><creatorcontrib>Ji Woo Hong</creatorcontrib><creatorcontrib>Eom, SooHwan</creatorcontrib><creatorcontrib>Shim, Jun Yeop</creatorcontrib><creatorcontrib>Kim, Junyeong</creatorcontrib><creatorcontrib>Yoo, Chang D</creatorcontrib><title>Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation</title><title>arXiv.org</title><description>Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets.</description><subject>Image enhancement</subject><subject>Image segmentation</subject><subject>Instance segmentation</subject><subject>Proposals</subject><subject>Qualitative analysis</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi70KwjAURoMgWLTvEHAOxKQ_OluLTgpKB5cSy21rbZN6kyK-vUV8AJfvDOd8E-IJKVdsHQgxI761DedcRLEIQ-mR7Apo2Lk2jiaDatlJuZoetIMKlbsbTVNUHbwMPmhpkB570CwzhboNrcI3lckYW6d0AfQMVQfafW8LMi1Va8H_cU6W6e6y3bMezXMA6_LGDKhHlUu-CcJo3Fj-V30AxPpAzA</recordid><startdate>20240816</startdate><enddate>20240816</enddate><creator>Tri Ton</creator><creator>Ji Woo Hong</creator><creator>Eom, SooHwan</creator><creator>Shim, Jun Yeop</creator><creator>Kim, Junyeong</creator><creator>Yoo, Chang D</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240816</creationdate><title>Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation</title><author>Tri Ton ; Ji Woo Hong ; Eom, SooHwan ; Shim, Jun Yeop ; Kim, Junyeong ; Yoo, Chang D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30945630973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Image enhancement</topic><topic>Image segmentation</topic><topic>Instance segmentation</topic><topic>Proposals</topic><topic>Qualitative analysis</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><toplevel>online_resources</toplevel><creatorcontrib>Tri Ton</creatorcontrib><creatorcontrib>Ji Woo Hong</creatorcontrib><creatorcontrib>Eom, SooHwan</creatorcontrib><creatorcontrib>Shim, Jun Yeop</creatorcontrib><creatorcontrib>Kim, Junyeong</creatorcontrib><creatorcontrib>Yoo, Chang D</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tri Ton</au><au>Ji Woo Hong</au><au>Eom, SooHwan</au><au>Shim, Jun Yeop</au><au>Kim, Junyeong</au><au>Yoo, Chang D</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation</atitle><jtitle>arXiv.org</jtitle><date>2024-08-16</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3094563097
source	Free E- Journals
subjects	Image enhancement Image segmentation Instance segmentation Proposals Qualitative analysis Three dimensional models Two dimensional models
title	Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T14%3A29%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Zero-Shot%20Dual-Path%20Integration%20Framework%20for%20Open-Vocabulary%203D%20Instance%20Segmentation&rft.jtitle=arXiv.org&rft.au=Tri%20Ton&rft.date=2024-08-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3094563097%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3094563097&rft_id=info:pmid/&rfr_iscdi=true