Enhancing software model encoding for feature location approaches based on machine learning techniques

Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Software and systems modeling 2022-02, Vol.21 (1), p.399-433
Hauptverfasser:	Marcén, Ana C., Pérez, Francisca, Pastor, Óscar, Cetina, Carlos
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Case studies Compilers Computer Science Domains Information Systems Applications (incl.Internet) Interpreters IT in Business Machine learning Neural networks Programming Languages Programming Techniques Recurrent neural networks Regular Paper Software Software Engineering Software Engineering/Programming and Operating Systems Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	433
container_issue	1
container_start_page	399
container_title	Software and systems modeling
container_volume	21
creator	Marcén, Ana C. Pérez, Francisca Pastor, Óscar Cetina, Carlos
description	Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of t
doi_str_mv	10.1007/s10270-021-00920-y
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2629163006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2629163006</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2629163006</pqid></control><display><type>article</type><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><source>SpringerLink Journals</source><creator>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creator><creatorcontrib>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creatorcontrib><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><identifier>ISSN: 1619-1366</identifier><identifier>EISSN: 1619-1374</identifier><identifier>DOI: 10.1007/s10270-021-00920-y</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial neural networks ; Case studies ; Compilers ; Computer Science ; Domains ; Information Systems Applications (incl.Internet) ; Interpreters ; IT in Business ; Machine learning ; Neural networks ; Programming Languages ; Programming Techniques ; Recurrent neural networks ; Regular Paper ; Software ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Statistical analysis</subject><ispartof>Software and systems modeling, 2022-02, Vol.21 (1), p.399-433</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</citedby><cites>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10270-021-00920-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10270-021-00920-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><title>Software and systems modeling</title><addtitle>Softw Syst Model</addtitle><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><subject>Artificial neural networks</subject><subject>Case studies</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Domains</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Interpreters</subject><subject>IT in Business</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Programming Languages</subject><subject>Programming Techniques</subject><subject>Recurrent neural networks</subject><subject>Regular Paper</subject><subject>Software</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Statistical analysis</subject><issn>1619-1366</issn><issn>1619-1374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Marcén, Ana C.</creator><creator>Pérez, Francisca</creator><creator>Pastor, Óscar</creator><creator>Cetina, Carlos</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220201</creationdate><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><author>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Case studies</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Domains</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Interpreters</topic><topic>IT in Business</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Programming Languages</topic><topic>Programming Techniques</topic><topic>Recurrent neural networks</topic><topic>Regular Paper</topic><topic>Software</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Software and systems modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marcén, Ana C.</au><au>Pérez, Francisca</au><au>Pastor, Óscar</au><au>Cetina, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing software model encoding for feature location approaches based on machine learning techniques</atitle><jtitle>Software and systems modeling</jtitle><stitle>Softw Syst Model</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>21</volume><issue>1</issue><spage>399</spage><epage>433</epage><pages>399-433</pages><issn>1619-1366</issn><eissn>1619-1374</eissn><abstract>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10270-021-00920-y</doi><tpages>35</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1619-1366
ispartof	Software and systems modeling, 2022-02, Vol.21 (1), p.399-433
issn	1619-1366 1619-1374
language	eng
recordid	cdi_proquest_journals_2629163006
source	SpringerLink Journals
subjects	Artificial neural networks Case studies Compilers Computer Science Domains Information Systems Applications (incl.Internet) Interpreters IT in Business Machine learning Neural networks Programming Languages Programming Techniques Recurrent neural networks Regular Paper Software Software Engineering Software Engineering/Programming and Operating Systems Statistical analysis
title	Enhancing software model encoding for feature location approaches based on machine learning techniques
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A17%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20software%20model%20encoding%20for%20feature%20location%20approaches%20based%20on%20machine%20learning%20techniques&rft.jtitle=Software%20and%20systems%20modeling&rft.au=Marc%C3%A9n,%20Ana%20C.&rft.date=2022-02-01&rft.volume=21&rft.issue=1&rft.spage=399&rft.epage=433&rft.pages=399-433&rft.issn=1619-1366&rft.eissn=1619-1374&rft_id=info:doi/10.1007/s10270-021-00920-y&rft_dat=%3Cproquest_cross%3E2629163006%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2629163006&rft_id=info:pmid/&rfr_iscdi=true