Text-Free Controllable 3-D Point Cloud Generation

Generating 3-D shapes with text inputs has long been a peculiar challenge in computer vision, which requires methodological know-how as well as a sense of art. Recently, text-to-image generation has driven remarkable progress, raising tremendous interest in text-guided shape generation, which furthe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on instrumentation and measurement 2024, Vol.73, p.1-12
Hauptverfasser: Xiao, Haihong, Kang, Wenxiong, Li, Yuqiong, Xu, Hongbin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Generating 3-D shapes with text inputs has long been a peculiar challenge in computer vision, which requires methodological know-how as well as a sense of art. Recently, text-to-image generation has driven remarkable progress, raising tremendous interest in text-guided shape generation, which further paves the way for industrial design. Nevertheless, prior efforts on text-guided 3-D synthesis either lack geometric details, are limited by the simple text input, or need expensive optimization and additional postprocessing, which make them unfriendly for novices. In this research, we present TFCNet, a novel approach for text-free controllable point cloud generation. In the training phase, we first design an empirically robust cross-modal skeletal point generator (CM-SPG) to predict skeletal points of the specific shape conditioned on the single image input. Then, we develop a diffusion-based dense point generator, which takes skeletal points as geometric guidance to produce dense point clouds that are faithful to the input images. In the inference phase, we propose an efficient text-free nonparametric transfer regime, which does not require separate training and can directly generate point cloud shapes while being semantically faithful to the provided text input. As evidenced by our experiments on the ShapeNet(v2) and CO3D datasets, our proposed method outperforms existing state of-the-art methods both quantitatively and qualitatively.
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2024.3353839