$3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition$

3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition

In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT\(^{+...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Habib Slim, Li, Xiang, Li, Yuchen, Ahmed, Mahmoud, Mohamed, Ayman, Upadhyay, Ujjwal, Abdelreheem, Ahmed, Prajapati, Arpit, Pothigara, Suhail, Wonka, Peter, Elhoseiny, Mohamed
Format:	Artikel
Sprache:	eng
Schlagworte:	Datasets Recognition Three dimensional models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Habib Slim Li, Xiang Li, Yuchen Ahmed, Mahmoud Mohamed, Ayman Upadhyay, Ujjwal Abdelreheem, Ahmed Prajapati, Arpit Pothigara, Suhail Wonka, Peter Elhoseiny, Mohamed
description	In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2884468766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2884468766</sourcerecordid><originalsourceid>FETCH-proquest_journals_28844687663</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSIMnZxzvcNSAyJ0Yir1taujdG0UnDMU8jMLSjKL0tNUfBJLEpP1S1OTsxJVTB2UQjLLM7Mz1NwSSxJLE4tUUjLL1Jwzs8tyC_OLAGKJ-YoBKUm56fngXk8DKxpiTnFqbxQmptB2c01xNlDF2h0YWlqcUl8Vn5pEVBTcbyRhYWJiZmFuZmZMXGqAOfZQPI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2884468766</pqid></control><display><type>article</type><title>3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition</title><source>Free E- Journals</source><creator>Habib Slim ; Li, Xiang ; Li, Yuchen ; Ahmed, Mahmoud ; Mohamed, Ayman ; Upadhyay, Ujjwal ; Abdelreheem, Ahmed ; Prajapati, Arpit ; Pothigara, Suhail ; Wonka, Peter ; Elhoseiny, Mohamed</creator><creatorcontrib>Habib Slim ; Li, Xiang ; Li, Yuchen ; Ahmed, Mahmoud ; Mohamed, Ayman ; Upadhyay, Ujjwal ; Abdelreheem, Ahmed ; Prajapati, Arpit ; Pothigara, Suhail ; Wonka, Peter ; Elhoseiny, Mohamed</creatorcontrib><description>In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Recognition ; Three dimensional models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Habib Slim</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><creatorcontrib>Li, Yuchen</creatorcontrib><creatorcontrib>Ahmed, Mahmoud</creatorcontrib><creatorcontrib>Mohamed, Ayman</creatorcontrib><creatorcontrib>Upadhyay, Ujjwal</creatorcontrib><creatorcontrib>Abdelreheem, Ahmed</creatorcontrib><creatorcontrib>Prajapati, Arpit</creatorcontrib><creatorcontrib>Pothigara, Suhail</creatorcontrib><creatorcontrib>Wonka, Peter</creatorcontrib><creatorcontrib>Elhoseiny, Mohamed</creatorcontrib><title>3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition</title><title>arXiv.org</title><description>In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.</description><subject>Datasets</subject><subject>Recognition</subject><subject>Three dimensional models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSIMnZxzvcNSAyJ0Yir1taujdG0UnDMU8jMLSjKL0tNUfBJLEpP1S1OTsxJVTB2UQjLLM7Mz1NwSSxJLE4tUUjLL1Jwzs8tyC_OLAGKJ-YoBKUm56fngXk8DKxpiTnFqbxQmptB2c01xNlDF2h0YWlqcUl8Vn5pEVBTcbyRhYWJiZmFuZmZMXGqAOfZQPI</recordid><startdate>20240312</startdate><enddate>20240312</enddate><creator>Habib Slim</creator><creator>Li, Xiang</creator><creator>Li, Yuchen</creator><creator>Ahmed, Mahmoud</creator><creator>Mohamed, Ayman</creator><creator>Upadhyay, Ujjwal</creator><creator>Abdelreheem, Ahmed</creator><creator>Prajapati, Arpit</creator><creator>Pothigara, Suhail</creator><creator>Wonka, Peter</creator><creator>Elhoseiny, Mohamed</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240312</creationdate><title>3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition</title><author>Habib Slim ; Li, Xiang ; Li, Yuchen ; Ahmed, Mahmoud ; Mohamed, Ayman ; Upadhyay, Ujjwal ; Abdelreheem, Ahmed ; Prajapati, Arpit ; Pothigara, Suhail ; Wonka, Peter ; Elhoseiny, Mohamed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28844687663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Recognition</topic><topic>Three dimensional models</topic><toplevel>online_resources</toplevel><creatorcontrib>Habib Slim</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><creatorcontrib>Li, Yuchen</creatorcontrib><creatorcontrib>Ahmed, Mahmoud</creatorcontrib><creatorcontrib>Mohamed, Ayman</creatorcontrib><creatorcontrib>Upadhyay, Ujjwal</creatorcontrib><creatorcontrib>Abdelreheem, Ahmed</creatorcontrib><creatorcontrib>Prajapati, Arpit</creatorcontrib><creatorcontrib>Pothigara, Suhail</creatorcontrib><creatorcontrib>Wonka, Peter</creatorcontrib><creatorcontrib>Elhoseiny, Mohamed</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Habib Slim</au><au>Li, Xiang</au><au>Li, Yuchen</au><au>Ahmed, Mahmoud</au><au>Mohamed, Ayman</au><au>Upadhyay, Ujjwal</au><au>Abdelreheem, Ahmed</au><au>Prajapati, Arpit</au><au>Pothigara, Suhail</au><au>Wonka, Peter</au><au>Elhoseiny, Mohamed</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition</atitle><jtitle>arXiv.org</jtitle><date>2024-03-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2884468766
source	Free E- Journals
subjects	Datasets Recognition Three dimensional models
title	3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T09%3A02%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=3DCoMPaT%5C(%5E%7B++%7D%5C):%20An%20improved%20Large-scale%203D%20Vision%20Dataset%20for%20Compositional%20Recognition&rft.jtitle=arXiv.org&rft.au=Habib%20Slim&rft.date=2024-03-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2884468766%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2884468766&rft_id=info:pmid/&rfr_iscdi=true

3DCoMPaT\(^{++}\): An improved Large-scale 3D Vision Dataset for Compositional Recognition