ModelSet: a dataset for machine learning in model-driven engineering

The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Software and systems modeling 2022-06, Vol.21 (3), p.967-986
Hauptverfasser:	López, José Antonio Hernández, Cánovas Izquierdo, Javier Luis, Cuadrado, Jesús Sánchez
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Compilers Computer Science Datasets Information Systems Applications (incl.Internet) Interpreters IT in Business Labeling Labels Machine learning Programming Languages Programming Techniques Search engines Software Software Engineering Software Engineering/Programming and Operating Systems Theme Section Paper Tooling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	986
container_issue	3
container_start_page	967
container_title	Software and systems modeling
container_volume	21
creator	López, José Antonio Hernández Cánovas Izquierdo, Javier Luis Cuadrado, Jesús Sánchez
description	The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficulty to label models due to the required domain expertise, and the relative immaturity of the application of ML to MDE. In this work, we present ModelSet , a labelled dataset of software models intended to enable the application of ML to address software modelling problems. To create it we have devised a method designed to facilitate the exploration and labelling of model datasets by interactively grouping similar models using off-the-shelf technologies like a search engine. We have built an Eclipse plug-in to support the labelling process, which we have used to label 5,466 Ecore meta-models and 5,120 UML models with its category as the main label plus additional secondary labels of interest. We have evaluated the ability of our labelling method to create meaningful groups of models in order to speed up the process, improving the effectiveness of classical clustering methods. We showcase the usefulness of the dataset by applying it in a real scenario: enhancing the MAR search engine. We use ModelSet to train models able to infer useful metadata to navigate search results. The dataset and the tooling are available at https://figshare.com/s/5a6c02fa8ed20782935c and a live version at http://modelset.github.io .
doi_str_mv	10.1007/s10270-021-00929-3
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2654886584</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2654886584</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-912d5526175ea113efd6566a57a2876b9f6bacc127541371e2e7244fa1a69a9d3</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEEhPsD3CKxDkQ58NpuKHxKQ1xAM5R1rqjaGtH0iHx7-koghsnW_L7vrYfxk5AnoGU7jyDVE4KqUBI6ZUXeo9NAMEL0M7s__aIh2yac7OQ0ijvDeKEXT10Fa2eqL_gkVexj5l6XneJr2P52rTEVxRT27RL3rR8vdOKKjUf1HJql8Oc0jA7Zgd1XGWa_tQj9nJz_Ty7E_PH2_vZ5VyUGnUvPKjKWoXgLEUATXWFFjFaF1XhcOFrXMSyBOWsGS4HUuSUMXWEiD76Sh-x0zF3k7r3LeU-vHXb1A4rg0JrigJtYQaVGlVl6nJOVIdNatYxfQaQYQcsjMDCACx8Awt6MOnRlDe7jyj9Rf_j-gIYjGxU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2654886584</pqid></control><display><type>article</type><title>ModelSet: a dataset for machine learning in model-driven engineering</title><source>Springer Nature - Complete Springer Journals</source><creator>López, José Antonio Hernández ; Cánovas Izquierdo, Javier Luis ; Cuadrado, Jesús Sánchez</creator><creatorcontrib>López, José Antonio Hernández ; Cánovas Izquierdo, Javier Luis ; Cuadrado, Jesús Sánchez</creatorcontrib><description>The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficulty to label models due to the required domain expertise, and the relative immaturity of the application of ML to MDE. In this work, we present ModelSet , a labelled dataset of software models intended to enable the application of ML to address software modelling problems. To create it we have devised a method designed to facilitate the exploration and labelling of model datasets by interactively grouping similar models using off-the-shelf technologies like a search engine. We have built an Eclipse plug-in to support the labelling process, which we have used to label 5,466 Ecore meta-models and 5,120 UML models with its category as the main label plus additional secondary labels of interest. We have evaluated the ability of our labelling method to create meaningful groups of models in order to speed up the process, improving the effectiveness of classical clustering methods. We showcase the usefulness of the dataset by applying it in a real scenario: enhancing the MAR search engine. We use ModelSet to train models able to infer useful metadata to navigate search results. The dataset and the tooling are available at https://figshare.com/s/5a6c02fa8ed20782935c and a live version at http://modelset.github.io .</description><identifier>ISSN: 1619-1366</identifier><identifier>EISSN: 1619-1374</identifier><identifier>DOI: 10.1007/s10270-021-00929-3</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Clustering ; Compilers ; Computer Science ; Datasets ; Information Systems Applications (incl.Internet) ; Interpreters ; IT in Business ; Labeling ; Labels ; Machine learning ; Programming Languages ; Programming Techniques ; Search engines ; Software ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Theme Section Paper ; Tooling</subject><ispartof>Software and systems modeling, 2022-06, Vol.21 (3), p.967-986</ispartof><rights>The Author(s) 2021</rights><rights>The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-912d5526175ea113efd6566a57a2876b9f6bacc127541371e2e7244fa1a69a9d3</citedby><cites>FETCH-LOGICAL-c363t-912d5526175ea113efd6566a57a2876b9f6bacc127541371e2e7244fa1a69a9d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10270-021-00929-3$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10270-021-00929-3$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,27911,27912,41475,42544,51306</link.rule.ids></links><search><creatorcontrib>López, José Antonio Hernández</creatorcontrib><creatorcontrib>Cánovas Izquierdo, Javier Luis</creatorcontrib><creatorcontrib>Cuadrado, Jesús Sánchez</creatorcontrib><title>ModelSet: a dataset for machine learning in model-driven engineering</title><title>Software and systems modeling</title><addtitle>Softw Syst Model</addtitle><description>The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficulty to label models due to the required domain expertise, and the relative immaturity of the application of ML to MDE. In this work, we present ModelSet , a labelled dataset of software models intended to enable the application of ML to address software modelling problems. To create it we have devised a method designed to facilitate the exploration and labelling of model datasets by interactively grouping similar models using off-the-shelf technologies like a search engine. We have built an Eclipse plug-in to support the labelling process, which we have used to label 5,466 Ecore meta-models and 5,120 UML models with its category as the main label plus additional secondary labels of interest. We have evaluated the ability of our labelling method to create meaningful groups of models in order to speed up the process, improving the effectiveness of classical clustering methods. We showcase the usefulness of the dataset by applying it in a real scenario: enhancing the MAR search engine. We use ModelSet to train models able to infer useful metadata to navigate search results. The dataset and the tooling are available at https://figshare.com/s/5a6c02fa8ed20782935c and a live version at http://modelset.github.io .</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Interpreters</subject><subject>IT in Business</subject><subject>Labeling</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Programming Languages</subject><subject>Programming Techniques</subject><subject>Search engines</subject><subject>Software</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Theme Section Paper</subject><subject>Tooling</subject><issn>1619-1366</issn><issn>1619-1374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE1PwzAMhiMEEhPsD3CKxDkQ58NpuKHxKQ1xAM5R1rqjaGtH0iHx7-koghsnW_L7vrYfxk5AnoGU7jyDVE4KqUBI6ZUXeo9NAMEL0M7s__aIh2yac7OQ0ijvDeKEXT10Fa2eqL_gkVexj5l6XneJr2P52rTEVxRT27RL3rR8vdOKKjUf1HJql8Oc0jA7Zgd1XGWa_tQj9nJz_Ty7E_PH2_vZ5VyUGnUvPKjKWoXgLEUATXWFFjFaF1XhcOFrXMSyBOWsGS4HUuSUMXWEiD76Sh-x0zF3k7r3LeU-vHXb1A4rg0JrigJtYQaVGlVl6nJOVIdNatYxfQaQYQcsjMDCACx8Awt6MOnRlDe7jyj9Rf_j-gIYjGxU</recordid><startdate>20220601</startdate><enddate>20220601</enddate><creator>López, José Antonio Hernández</creator><creator>Cánovas Izquierdo, Javier Luis</creator><creator>Cuadrado, Jesús Sánchez</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220601</creationdate><title>ModelSet: a dataset for machine learning in model-driven engineering</title><author>López, José Antonio Hernández ; Cánovas Izquierdo, Javier Luis ; Cuadrado, Jesús Sánchez</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-912d5526175ea113efd6566a57a2876b9f6bacc127541371e2e7244fa1a69a9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Interpreters</topic><topic>IT in Business</topic><topic>Labeling</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Programming Languages</topic><topic>Programming Techniques</topic><topic>Search engines</topic><topic>Software</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Theme Section Paper</topic><topic>Tooling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>López, José Antonio Hernández</creatorcontrib><creatorcontrib>Cánovas Izquierdo, Javier Luis</creatorcontrib><creatorcontrib>Cuadrado, Jesús Sánchez</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Software and systems modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>López, José Antonio Hernández</au><au>Cánovas Izquierdo, Javier Luis</au><au>Cuadrado, Jesús Sánchez</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ModelSet: a dataset for machine learning in model-driven engineering</atitle><jtitle>Software and systems modeling</jtitle><stitle>Softw Syst Model</stitle><date>2022-06-01</date><risdate>2022</risdate><volume>21</volume><issue>3</issue><spage>967</spage><epage>986</epage><pages>967-986</pages><issn>1619-1366</issn><eissn>1619-1374</eissn><abstract>The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficulty to label models due to the required domain expertise, and the relative immaturity of the application of ML to MDE. In this work, we present ModelSet , a labelled dataset of software models intended to enable the application of ML to address software modelling problems. To create it we have devised a method designed to facilitate the exploration and labelling of model datasets by interactively grouping similar models using off-the-shelf technologies like a search engine. We have built an Eclipse plug-in to support the labelling process, which we have used to label 5,466 Ecore meta-models and 5,120 UML models with its category as the main label plus additional secondary labels of interest. We have evaluated the ability of our labelling method to create meaningful groups of models in order to speed up the process, improving the effectiveness of classical clustering methods. We showcase the usefulness of the dataset by applying it in a real scenario: enhancing the MAR search engine. We use ModelSet to train models able to infer useful metadata to navigate search results. The dataset and the tooling are available at https://figshare.com/s/5a6c02fa8ed20782935c and a live version at http://modelset.github.io .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10270-021-00929-3</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1619-1366
ispartof	Software and systems modeling, 2022-06, Vol.21 (3), p.967-986
issn	1619-1366 1619-1374
language	eng
recordid	cdi_proquest_journals_2654886584
source	Springer Nature - Complete Springer Journals
subjects	Algorithms Clustering Compilers Computer Science Datasets Information Systems Applications (incl.Internet) Interpreters IT in Business Labeling Labels Machine learning Programming Languages Programming Techniques Search engines Software Software Engineering Software Engineering/Programming and Operating Systems Theme Section Paper Tooling
title	ModelSet: a dataset for machine learning in model-driven engineering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T05%3A43%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ModelSet:%20a%20dataset%20for%20machine%20learning%20in%20model-driven%20engineering&rft.jtitle=Software%20and%20systems%20modeling&rft.au=L%C3%B3pez,%20Jos%C3%A9%20Antonio%20Hern%C3%A1ndez&rft.date=2022-06-01&rft.volume=21&rft.issue=3&rft.spage=967&rft.epage=986&rft.pages=967-986&rft.issn=1619-1366&rft.eissn=1619-1374&rft_id=info:doi/10.1007/s10270-021-00929-3&rft_dat=%3Cproquest_cross%3E2654886584%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2654886584&rft_id=info:pmid/&rfr_iscdi=true