What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files
Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) soft...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Meltzer, Peter Lambourne, Joseph G Grandi, Daniele |
description | Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available. |
doi_str_mv | 10.48550/arxiv.2304.14275 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2304_14275</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2304_14275</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-a6d7da6a378deace3d86b35c245553522722c1800b276dcaeede2c33f7b87c363</originalsourceid><addsrcrecordid>eNotUMFOhDAU7MWDWf0AT_bmCWRbSrkZgrtqRN3ENR7Jo30LTQqYFlD-3l30MjOXmckMIVfrKIxTIaJbcD9mChmP4nAdMynOyfzZwHDjqeko0Fdo8Y5uJrAjDKaraeY9tpWdgx24gb5jC91gFH3u-m-LusaTrYCuHuGoX3qN1tOhcf1YN_TDowt2rp-MRr1ELy15dk-3xqK_IGcHsB4v_3lF9tvNPn8MireHpzwrAkikOIKWGhLgMtUICrlOk4oLxWIhBBeMScbUOo2iislEK0DUyBTnB1mlUvGEr8j1X-yyvfxypgU3l6cPyuUD_gs-6VdR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><source>arXiv.org</source><creator>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</creator><creatorcontrib>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</creatorcontrib><description>Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available.</description><identifier>DOI: 10.48550/arxiv.2304.14275</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-04</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2304.14275$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.14275$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Meltzer, Peter</creatorcontrib><creatorcontrib>Lambourne, Joseph G</creatorcontrib><creatorcontrib>Grandi, Daniele</creatorcontrib><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><description>Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotUMFOhDAU7MWDWf0AT_bmCWRbSrkZgrtqRN3ENR7Jo30LTQqYFlD-3l30MjOXmckMIVfrKIxTIaJbcD9mChmP4nAdMynOyfzZwHDjqeko0Fdo8Y5uJrAjDKaraeY9tpWdgx24gb5jC91gFH3u-m-LusaTrYCuHuGoX3qN1tOhcf1YN_TDowt2rp-MRr1ELy15dk-3xqK_IGcHsB4v_3lF9tvNPn8MireHpzwrAkikOIKWGhLgMtUICrlOk4oLxWIhBBeMScbUOo2iislEK0DUyBTnB1mlUvGEr8j1X-yyvfxypgU3l6cPyuUD_gs-6VdR</recordid><startdate>20230425</startdate><enddate>20230425</enddate><creator>Meltzer, Peter</creator><creator>Lambourne, Joseph G</creator><creator>Grandi, Daniele</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230425</creationdate><title>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</title><author>Meltzer, Peter ; Lambourne, Joseph G ; Grandi, Daniele</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-a6d7da6a378deace3d86b35c245553522722c1800b276dcaeede2c33f7b87c363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Meltzer, Peter</creatorcontrib><creatorcontrib>Lambourne, Joseph G</creatorcontrib><creatorcontrib>Grandi, Daniele</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Meltzer, Peter</au><au>Lambourne, Joseph G</au><au>Grandi, Daniele</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files</atitle><date>2023-04-25</date><risdate>2023</risdate><abstract>Semantic knowledge of part-part and part-whole relationships in assemblies is
useful for a variety of tasks from searching design repositories to the
construction of engineering knowledge bases. In this work we propose that the
natural language names designers use in Computer Aided Design (CAD) software
are a valuable source of such knowledge, and that Large Language Models (LLMs)
contain useful domain-specific information for working with this data as well
as other CAD and engineering-related tasks.
In particular we extract and clean a large corpus of natural language part,
feature and document names and use this to quantitatively demonstrate that a
pre-trained language model can outperform numerous benchmarks on three
self-supervised tasks, without ever having seen this data before. Moreover, we
show that fine-tuning on the text data corpus further boosts the performance on
all tasks, thus demonstrating the value of the text data which until now has
been largely ignored. We also identify key limitations to using LLMs with text
data alone, and our findings provide a strong motivation for further work into
multi-modal text-geometry models.
To aid and encourage further work in this area we make all our data and code
publicly available.</abstract><doi>10.48550/arxiv.2304.14275</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2304.14275 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2304_14275 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Learning |
title | What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T10%3A16%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=What's%20in%20a%20Name?%20Evaluating%20Assembly-Part%20Semantic%20Knowledge%20in%20Language%20Models%20through%20User-Provided%20Names%20in%20CAD%20Files&rft.au=Meltzer,%20Peter&rft.date=2023-04-25&rft_id=info:doi/10.48550/arxiv.2304.14275&rft_dat=%3Carxiv_GOX%3E2304_14275%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |