Enhancing software model encoding for feature location approaches based on machine learning techniques
Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques f...
Gespeichert in:
Veröffentlicht in: | Software and systems modeling 2022-02, Vol.21 (1), p.399-433 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 433 |
---|---|
container_issue | 1 |
container_start_page | 399 |
container_title | Software and systems modeling |
container_volume | 21 |
creator | Marcén, Ana C. Pérez, Francisca Pastor, Óscar Cetina, Carlos |
description | Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of t |
doi_str_mv | 10.1007/s10270-021-00920-y |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2629163006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2629163006</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2629163006</pqid></control><display><type>article</type><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><source>SpringerLink Journals</source><creator>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creator><creatorcontrib>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</creatorcontrib><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><identifier>ISSN: 1619-1366</identifier><identifier>EISSN: 1619-1374</identifier><identifier>DOI: 10.1007/s10270-021-00920-y</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Artificial neural networks ; Case studies ; Compilers ; Computer Science ; Domains ; Information Systems Applications (incl.Internet) ; Interpreters ; IT in Business ; Machine learning ; Neural networks ; Programming Languages ; Programming Techniques ; Recurrent neural networks ; Regular Paper ; Software ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Statistical analysis</subject><ispartof>Software and systems modeling, 2022-02, Vol.21 (1), p.399-433</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</citedby><cites>FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10270-021-00920-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10270-021-00920-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><title>Software and systems modeling</title><addtitle>Softw Syst Model</addtitle><description>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</description><subject>Artificial neural networks</subject><subject>Case studies</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Domains</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Interpreters</subject><subject>IT in Business</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Programming Languages</subject><subject>Programming Techniques</subject><subject>Recurrent neural networks</subject><subject>Regular Paper</subject><subject>Software</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Statistical analysis</subject><issn>1619-1366</issn><issn>1619-1374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9UMtOwzAQtBBIVKU_wCkS58Cundj1EVXlIVXiAmfLcew2qLWLnQrl73EIghun3Z2dmV0NIdcItwgg7hICFVACxRJAUiiHMzJDjrJEJqrz357zS7JIqWsAKiplxfmMuLXfaW86vy1ScP2njrY4hNbuC-tNaEfchVg4q_tTXu2D0X0XfKGPxxi02dlUNDrZtsjYIc-dzySrox-VvTU7332cbLoiF07vk1381Dl5e1i_rp7Kzcvj8-p-UxqGsi9bVmtZC83AOIGmNsA0opAZdCgc13LJamxB84YvRWuxaoQFwS06KhhHNic3k2_-brzbq_dwij6fVJRTiZwB8MyiE8vEkFK0Th1jd9BxUAhqjFRNkaocqfqOVA1ZxCZRymS_tfHP-h_VFxHUeqA</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Marcén, Ana C.</creator><creator>Pérez, Francisca</creator><creator>Pastor, Óscar</creator><creator>Cetina, Carlos</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220201</creationdate><title>Enhancing software model encoding for feature location approaches based on machine learning techniques</title><author>Marcén, Ana C. ; Pérez, Francisca ; Pastor, Óscar ; Cetina, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d35a957a30cf71c5c03a1179a95f17f6a98351d0a6b687de14b7e076e1f273613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Case studies</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Domains</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Interpreters</topic><topic>IT in Business</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Programming Languages</topic><topic>Programming Techniques</topic><topic>Recurrent neural networks</topic><topic>Regular Paper</topic><topic>Software</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marcén, Ana C.</creatorcontrib><creatorcontrib>Pérez, Francisca</creatorcontrib><creatorcontrib>Pastor, Óscar</creatorcontrib><creatorcontrib>Cetina, Carlos</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Software and systems modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marcén, Ana C.</au><au>Pérez, Francisca</au><au>Pastor, Óscar</au><au>Cetina, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing software model encoding for feature location approaches based on machine learning techniques</atitle><jtitle>Software and systems modeling</jtitle><stitle>Softw Syst Model</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>21</volume><issue>1</issue><spage>399</spage><epage>433</epage><pages>399-433</pages><issn>1619-1366</issn><eissn>1619-1374</eissn><abstract>Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10270-021-00920-y</doi><tpages>35</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1619-1366 |
ispartof | Software and systems modeling, 2022-02, Vol.21 (1), p.399-433 |
issn | 1619-1366 1619-1374 |
language | eng |
recordid | cdi_proquest_journals_2629163006 |
source | SpringerLink Journals |
subjects | Artificial neural networks Case studies Compilers Computer Science Domains Information Systems Applications (incl.Internet) Interpreters IT in Business Machine learning Neural networks Programming Languages Programming Techniques Recurrent neural networks Regular Paper Software Software Engineering Software Engineering/Programming and Operating Systems Statistical analysis |
title | Enhancing software model encoding for feature location approaches based on machine learning techniques |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A17%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20software%20model%20encoding%20for%20feature%20location%20approaches%20based%20on%20machine%20learning%20techniques&rft.jtitle=Software%20and%20systems%20modeling&rft.au=Marc%C3%A9n,%20Ana%20C.&rft.date=2022-02-01&rft.volume=21&rft.issue=1&rft.spage=399&rft.epage=433&rft.pages=399-433&rft.issn=1619-1366&rft.eissn=1619-1374&rft_id=info:doi/10.1007/s10270-021-00920-y&rft_dat=%3Cproquest_cross%3E2629163006%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2629163006&rft_id=info:pmid/&rfr_iscdi=true |