CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wu, Sifan, Khasahmadi, Amir, Katz, Mor, Jayaraman, Pradeep Kumar, Pu, Yewen, Willis, Karl, Liu, Bang
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wu, Sifan Khasahmadi, Amir Katz, Mor Jayaraman, Pradeep Kumar Pu, Yewen Willis, Karl Liu, Bang
description	Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.
doi_str_mv	10.48550/arxiv.2409.17457
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_17457</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_17457</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_174573</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NzE152QIdE5MCfPxtVJwKspMSc_MS1fwScxLL01MT1VIzEtRCMsszszPU8jMUyjJSFVwT81LLUosAYnkpykEJBYl5qaWFGUmKzg7uigEZ6eWJGekFvMwsKYl5hSn8kJpbgZ5N9cQZw9dsOXxBUWZuYlFlfEgR8SDHWFMWAUAdyc7gw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><source>arXiv.org</source><creator>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</creator><creatorcontrib>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</creatorcontrib><description>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</description><identifier>DOI: 10.48550/arxiv.2409.17457</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.17457$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.17457$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Sifan</creatorcontrib><creatorcontrib>Khasahmadi, Amir</creatorcontrib><creatorcontrib>Katz, Mor</creatorcontrib><creatorcontrib>Jayaraman, Pradeep Kumar</creatorcontrib><creatorcontrib>Pu, Yewen</creatorcontrib><creatorcontrib>Willis, Karl</creatorcontrib><creatorcontrib>Liu, Bang</creatorcontrib><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><description>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NzE152QIdE5MCfPxtVJwKspMSc_MS1fwScxLL01MT1VIzEtRCMsszszPU8jMUyjJSFVwT81LLUosAYnkpykEJBYl5qaWFGUmKzg7uigEZ6eWJGekFvMwsKYl5hSn8kJpbgZ5N9cQZw9dsOXxBUWZuYlFlfEgR8SDHWFMWAUAdyc7gw</recordid><startdate>20240925</startdate><enddate>20240925</enddate><creator>Wu, Sifan</creator><creator>Khasahmadi, Amir</creator><creator>Katz, Mor</creator><creator>Jayaraman, Pradeep Kumar</creator><creator>Pu, Yewen</creator><creator>Willis, Karl</creator><creator>Liu, Bang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240925</creationdate><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><author>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_174573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Sifan</creatorcontrib><creatorcontrib>Khasahmadi, Amir</creatorcontrib><creatorcontrib>Katz, Mor</creatorcontrib><creatorcontrib>Jayaraman, Pradeep Kumar</creatorcontrib><creatorcontrib>Pu, Yewen</creatorcontrib><creatorcontrib>Willis, Karl</creatorcontrib><creatorcontrib>Liu, Bang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Sifan</au><au>Khasahmadi, Amir</au><au>Katz, Mor</au><au>Jayaraman, Pradeep Kumar</au><au>Pu, Yewen</au><au>Willis, Karl</au><au>Liu, Bang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</atitle><date>2024-09-25</date><risdate>2024</risdate><abstract>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</abstract><doi>10.48550/arxiv.2409.17457</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.17457
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_17457
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
title	CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T11%3A17%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CadVLM:%20Bridging%20Language%20and%20Vision%20in%20the%20Generation%20of%20Parametric%20CAD%20Sketches&rft.au=Wu,%20Sifan&rft.date=2024-09-25&rft_id=info:doi/10.48550/arxiv.2409.17457&rft_dat=%3Carxiv_GOX%3E2409_17457%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true