CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wu, Sifan, Khasahmadi, Amir, Katz, Mor, Jayaraman, Pradeep Kumar, Pu, Yewen, Willis, Karl, Liu, Bang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wu, Sifan
Khasahmadi, Amir
Katz, Mor
Jayaraman, Pradeep Kumar
Pu, Yewen
Willis, Karl
Liu, Bang
description Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.
doi_str_mv 10.48550/arxiv.2409.17457
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_17457</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_17457</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_174573</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NzE152QIdE5MCfPxtVJwKspMSc_MS1fwScxLL01MT1VIzEtRCMsszszPU8jMUyjJSFVwT81LLUosAYnkpykEJBYl5qaWFGUmKzg7uigEZ6eWJGekFvMwsKYl5hSn8kJpbgZ5N9cQZw9dsOXxBUWZuYlFlfEgR8SDHWFMWAUAdyc7gw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><source>arXiv.org</source><creator>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</creator><creatorcontrib>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</creatorcontrib><description>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</description><identifier>DOI: 10.48550/arxiv.2409.17457</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.17457$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.17457$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Sifan</creatorcontrib><creatorcontrib>Khasahmadi, Amir</creatorcontrib><creatorcontrib>Katz, Mor</creatorcontrib><creatorcontrib>Jayaraman, Pradeep Kumar</creatorcontrib><creatorcontrib>Pu, Yewen</creatorcontrib><creatorcontrib>Willis, Karl</creatorcontrib><creatorcontrib>Liu, Bang</creatorcontrib><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><description>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NzE152QIdE5MCfPxtVJwKspMSc_MS1fwScxLL01MT1VIzEtRCMsszszPU8jMUyjJSFVwT81LLUosAYnkpykEJBYl5qaWFGUmKzg7uigEZ6eWJGekFvMwsKYl5hSn8kJpbgZ5N9cQZw9dsOXxBUWZuYlFlfEgR8SDHWFMWAUAdyc7gw</recordid><startdate>20240925</startdate><enddate>20240925</enddate><creator>Wu, Sifan</creator><creator>Khasahmadi, Amir</creator><creator>Katz, Mor</creator><creator>Jayaraman, Pradeep Kumar</creator><creator>Pu, Yewen</creator><creator>Willis, Karl</creator><creator>Liu, Bang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240925</creationdate><title>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</title><author>Wu, Sifan ; Khasahmadi, Amir ; Katz, Mor ; Jayaraman, Pradeep Kumar ; Pu, Yewen ; Willis, Karl ; Liu, Bang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_174573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Sifan</creatorcontrib><creatorcontrib>Khasahmadi, Amir</creatorcontrib><creatorcontrib>Katz, Mor</creatorcontrib><creatorcontrib>Jayaraman, Pradeep Kumar</creatorcontrib><creatorcontrib>Pu, Yewen</creatorcontrib><creatorcontrib>Willis, Karl</creatorcontrib><creatorcontrib>Liu, Bang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Sifan</au><au>Khasahmadi, Amir</au><au>Katz, Mor</au><au>Jayaraman, Pradeep Kumar</au><au>Pu, Yewen</au><au>Willis, Karl</au><au>Liu, Bang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches</atitle><date>2024-09-25</date><risdate>2024</risdate><abstract>Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.</abstract><doi>10.48550/arxiv.2409.17457</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2409.17457
ispartof
issn
language eng
recordid cdi_arxiv_primary_2409_17457
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
title CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T11%3A17%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CadVLM:%20Bridging%20Language%20and%20Vision%20in%20the%20Generation%20of%20Parametric%20CAD%20Sketches&rft.au=Wu,%20Sifan&rft.date=2024-09-25&rft_id=info:doi/10.48550/arxiv.2409.17457&rft_dat=%3Carxiv_GOX%3E2409_17457%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true