DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

Large pre-trained models have had a significant impact on computer vision by enabling multi-modal learning, where the CLIP model has achieved impressive results in image classification, object detection, and semantic segmentation. However, the model's performance on 3D point cloud processing ta...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-05
Hauptverfasser:	Shen, Sitian, Zhu, Zilin, Fan, Linqian, Zhang, Harry, Wu, Xinxiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Classification Computer vision Domains Image classification Image segmentation Object recognition Semantic segmentation Three dimensional models Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!