Enhancing software model encoding for feature location approaches based on machine learning techniques

Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software and systems modeling 2022-02, Vol.21 (1), p.399-433
Hauptverfasser: Marcén, Ana C., Pérez, Francisca, Pastor, Óscar, Cetina, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 433
container_issue 1
container_start_page 399
container_title Software and systems modeling
container_volume 21
creator Marcén, Ana C.
Pérez, Francisca
Pastor, Óscar
Cetina, Carlos
description Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of t
doi_str_mv 10.1007/s10270-021-00920-y
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2629163006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2629163006</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2629163006</pqid></control><display><type>article</type><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><source>SpringerLink Journals</source><creator>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creator><creatorcontrib>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creatorcontrib><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><identifier>ISSN: 1619-1366</identifier><identifier>EISSN: 1619-1374</identifier><identifier>DOI: 10.1007/s10270-021-00920-y</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial neural networks ; Case studies ; Compilers ; Computer Science ; Domains ; Information Systems Applications (incl.Internet) ; Interpreters ; IT in Business ; Machine learning ; Neural networks ; Programming Languages ; Programming Techniques ; Recurrent neural networks ; Regular Paper ; Software ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Statistical analysis</subject><ispartof>Software and systems modeling, 2022-02, Vol.21 (1), p.399-433</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</citedby><cites>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10270-021-00920-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10270-021-00920-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><title>Software and systems modeling</title><addtitle>Softw Syst Model</addtitle><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><subject>Artificial neural networks</subject><subject>Case studies</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Domains</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Interpreters</subject><subject>IT in Business</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Programming Languages</subject><subject>Programming Techniques</subject><subject>Recurrent neural networks</subject><subject>Regular Paper</subject><subject>Software</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Statistical analysis</subject><issn>1619-1366</issn><issn>1619-1374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Marcén, Ana C.</creator><creator>Pérez, Francisca</creator><creator>Pastor, Óscar</creator><creator>Cetina, Carlos</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220201</creationdate><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><author>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Case studies</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Domains</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Interpreters</topic><topic>IT in Business</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Programming Languages</topic><topic>Programming Techniques</topic><topic>Recurrent neural networks</topic><topic>Regular Paper</topic><topic>Software</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Software and systems modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marcén, Ana C.</au><au>Pérez, Francisca</au><au>Pastor, Óscar</au><au>Cetina, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing software model encoding for feature location approaches based on machine learning techniques</atitle><jtitle>Software and systems modeling</jtitle><stitle>Softw Syst Model</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>21</volume><issue>1</issue><spage>399</spage><epage>433</epage><pages>399-433</pages><issn>1619-1366</issn><eissn>1619-1374</eissn><abstract>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10270-021-00920-y</doi><tpages>35</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1619-1366
ispartof Software and systems modeling, 2022-02, Vol.21 (1), p.399-433
issn 1619-1366
1619-1374
language eng
recordid cdi_proquest_journals_2629163006
source SpringerLink Journals
subjects Artificial neural networks
Case studies
Compilers
Computer Science
Domains
Information Systems Applications (incl.Internet)
Interpreters
IT in Business
Machine learning
Neural networks
Programming Languages
Programming Techniques
Recurrent neural networks
Regular Paper
Software
Software Engineering
Software Engineering/Programming and Operating Systems
Statistical analysis
title Enhancing software model encoding for feature location approaches based on machine learning techniques
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A17%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20software%20model%20encoding%20for%20feature%20location%20approaches%20based%20on%20machine%20learning%20techniques&rft.jtitle=Software%20and%20systems%20modeling&rft.au=Marc%C3%A9n,%20Ana%20C.&rft.date=2022-02-01&rft.volume=21&rft.issue=1&rft.spage=399&rft.epage=433&rft.pages=399-433&rft.issn=1619-1366&rft.eissn=1619-1374&rft_id=info:doi/10.1007/s10270-021-00920-y&rft_dat=%3Cproquest_cross%3E2629163006%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2629163006&rft_id=info:pmid/&rfr_iscdi=true