DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

Large pre-trained models have had a significant impact on computer vision by enabling multi-modal learning, where the CLIP model has achieved impressive results in image classification, object detection, and semantic segmentation. However, the model's performance on 3D point cloud processing ta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-05
Hauptverfasser: Shen, Sitian, Zhu, Zilin, Fan, Linqian, Zhang, Harry, Wu, Xinxiao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!