What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files
Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) soft...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-04 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Meltzer, Peter Lambourne, Joseph G Grandi, Daniele |
description | Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2807202890</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2807202890</sourcerecordid><originalsourceid>FETCH-proquest_journals_28072028903</originalsourceid><addsrcrecordid>eNqNjd1qAjEUhIMgKOo7HPDCq4WYrbq9EvEHoVaEtngpR_d0N5JNNCex-Pb-0Afwagbmm5maaKo07SfZm1IN0WE-SinVcKQGg7QprtsSQ49BW0BYY0VjmF_QRAzaFjBhpmpvrskGfYAvqtAGfYAP6_4M5QU9aiu0RcS7_3Q5GYZQeheLEn6YfLLx7qJzyp_Tz5fpZAYLbYjbov6Lhqnzry3RXcy_p8vk5N05Eofd0UVv79FOZXKkpMreZfoadQOuGUxH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2807202890</pqid></control><display><type>article</type><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><source>Freely Accessible Journals</source><creator>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</creator><creatorcontrib>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</creatorcontrib><description>Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>CAD ; Computer aided design ; Knowledge bases (artificial intelligence) ; Language ; Natural language ; Natural language processing ; Semantics</subject><ispartof>arXiv.org, 2023-04</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Meltzer, Peter</creatorcontrib><creatorcontrib>Lambourne, Joseph G</creatorcontrib><creatorcontrib>Grandi, Daniele</creatorcontrib><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><title>arXiv.org</title><description>Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.</description><subject>CAD</subject><subject>Computer aided design</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Language</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Semantics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjd1qAjEUhIMgKOo7HPDCq4WYrbq9EvEHoVaEtngpR_d0N5JNNCex-Pb-0Afwagbmm5maaKo07SfZm1IN0WE-SinVcKQGg7QprtsSQ49BW0BYY0VjmF_QRAzaFjBhpmpvrskGfYAvqtAGfYAP6_4M5QU9aiu0RcS7_3Q5GYZQeheLEn6YfLLx7qJzyp_Tz5fpZAYLbYjbov6Lhqnzry3RXcy_p8vk5N05Eofd0UVv79FOZXKkpMreZfoadQOuGUxH</recordid><startdate>20230425</startdate><enddate>20230425</enddate><creator>Meltzer, Peter</creator><creator>Lambourne, Joseph G</creator><creator>Grandi, Daniele</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230425</creationdate><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><author>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28072028903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>CAD</topic><topic>Computer aided design</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Language</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Semantics</topic><toplevel>online_resources</toplevel><creatorcontrib>Meltzer, Peter</creatorcontrib><creatorcontrib>Lambourne, Joseph G</creatorcontrib><creatorcontrib>Grandi, Daniele</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Meltzer, Peter</au><au>Lambourne, Joseph G</au><au>Grandi, Daniele</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</atitle><jtitle>arXiv.org</jtitle><date>2023-04-25</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2807202890 |
source | Freely Accessible Journals |
subjects | CAD Computer aided design Knowledge bases (artificial intelligence) Language Natural language Natural language processing Semantics |
title | What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T10%3A01%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=What's%20in%20a%20Name?%20Evaluating%20Assembly-Part%20Semantic%20Knowledge%20in%20Language%20Models%20through%20User-Provided%20Names%20in%20CAD%20Files&rft.jtitle=arXiv.org&rft.au=Meltzer,%20Peter&rft.date=2023-04-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2807202890%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2807202890&rft_id=info:pmid/&rfr_iscdi=true |