Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics
Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To addre...
Gespeichert in:
Veröffentlicht in: | Baltic Journal of Modern Computing 2019, Vol.7 (1), p.31-46 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 46 |
---|---|
container_issue | 1 |
container_start_page | 31 |
container_title | Baltic Journal of Modern Computing |
container_volume | 7 |
creator | Öztürk, Muhammed Maruf |
description | Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection. |
doi_str_mv | 10.22364/bjmc.2019.7.1.03 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2209951535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209951535</sourcerecordid><originalsourceid>FETCH-LOGICAL-c268t-75775f6c40162fff446bcbc00d0a54ac48a33502b26548cec75b5314c7743d0e3</originalsourceid><addsrcrecordid>eNpNkE1LAzEURYMoWGp_gLuA6xlfviYzSxmqFioKKrgLmTeJpLSdmqRi_72tVXD17oPDvXAIuWRQci4qed0tVlhyYE2pS1aCOCEjzpUq6kbB6b98TiYpLQCAqVrwGkbkbZZoO6w2S_cV8q7obHI9bZfblF0M63c6ePoUB3Qp0QeXY8BEbaJT7x3m8OkOT1jT52xzwH1R7_6wC3Lm7TK5ye8dk9fb6Ut7X8wf72btzbxAXtW50Epr5SuUwCruvZey6rBDgB6skhZlbYVQwDteKVmjQ606JZhEraXowYkxuTr2buLwsXUpm8Wwjev9pOEcmkYxJdSeYkcK45BSdN5sYljZuDMMzI9Dc3BoDg6NNsyAEN9uFGQ6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2209951535</pqid></control><display><type>article</type><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Öztürk, Muhammed Maruf</creator><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><description>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</description><identifier>ISSN: 2255-8950</identifier><identifier>ISSN: 2255-8942</identifier><identifier>EISSN: 2255-8950</identifier><identifier>DOI: 10.22364/bjmc.2019.7.1.03</identifier><language>eng</language><publisher>Riga: University of Latvia</publisher><subject>Clustering ; Complexity ; Correlation analysis ; Correlation coefficients ; Data mining ; Datasets ; Defects ; Fuzzy systems ; International conferences ; Knowledge management ; Methods ; Researchers ; Software engineering ; Software quality ; Software reliability ; Training</subject><ispartof>Baltic Journal of Modern Computing, 2019, Vol.7 (1), p.31-46</ispartof><rights>2019. This work is published under https://creativecommons.org/licenses/by-sa/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><title>Baltic Journal of Modern Computing</title><description>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</description><subject>Clustering</subject><subject>Complexity</subject><subject>Correlation analysis</subject><subject>Correlation coefficients</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Defects</subject><subject>Fuzzy systems</subject><subject>International conferences</subject><subject>Knowledge management</subject><subject>Methods</subject><subject>Researchers</subject><subject>Software engineering</subject><subject>Software quality</subject><subject>Software reliability</subject><subject>Training</subject><issn>2255-8950</issn><issn>2255-8942</issn><issn>2255-8950</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpNkE1LAzEURYMoWGp_gLuA6xlfviYzSxmqFioKKrgLmTeJpLSdmqRi_72tVXD17oPDvXAIuWRQci4qed0tVlhyYE2pS1aCOCEjzpUq6kbB6b98TiYpLQCAqVrwGkbkbZZoO6w2S_cV8q7obHI9bZfblF0M63c6ePoUB3Qp0QeXY8BEbaJT7x3m8OkOT1jT52xzwH1R7_6wC3Lm7TK5ye8dk9fb6Ut7X8wf72btzbxAXtW50Epr5SuUwCruvZey6rBDgB6skhZlbYVQwDteKVmjQ606JZhEraXowYkxuTr2buLwsXUpm8Wwjev9pOEcmkYxJdSeYkcK45BSdN5sYljZuDMMzI9Dc3BoDg6NNsyAEN9uFGQ6</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Öztürk, Muhammed Maruf</creator><general>University of Latvia</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2019</creationdate><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><author>Öztürk, Muhammed Maruf</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c268t-75775f6c40162fff446bcbc00d0a54ac48a33502b26548cec75b5314c7743d0e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Clustering</topic><topic>Complexity</topic><topic>Correlation analysis</topic><topic>Correlation coefficients</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Defects</topic><topic>Fuzzy systems</topic><topic>International conferences</topic><topic>Knowledge management</topic><topic>Methods</topic><topic>Researchers</topic><topic>Software engineering</topic><topic>Software quality</topic><topic>Software reliability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Baltic Journal of Modern Computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Öztürk, Muhammed Maruf</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</atitle><jtitle>Baltic Journal of Modern Computing</jtitle><date>2019</date><risdate>2019</risdate><volume>7</volume><issue>1</issue><spage>31</spage><epage>46</epage><pages>31-46</pages><issn>2255-8950</issn><issn>2255-8942</issn><eissn>2255-8950</eissn><abstract>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</abstract><cop>Riga</cop><pub>University of Latvia</pub><doi>10.22364/bjmc.2019.7.1.03</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2255-8950 |
ispartof | Baltic Journal of Modern Computing, 2019, Vol.7 (1), p.31-46 |
issn | 2255-8950 2255-8942 2255-8950 |
language | eng |
recordid | cdi_proquest_journals_2209951535 |
source | DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Clustering Complexity Correlation analysis Correlation coefficients Data mining Datasets Defects Fuzzy systems International conferences Knowledge management Methods Researchers Software engineering Software quality Software reliability Training |
title | Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T02%3A40%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Is%20Complexity-based%20Clustering%20of%20Process%20Metrics%20as%20Effective%20as%20in%20Static%20Code%20Metrics&rft.jtitle=Baltic%20Journal%20of%20Modern%20Computing&rft.au=%C3%96zt%C3%BCrk,%20Muhammed%20Maruf&rft.date=2019&rft.volume=7&rft.issue=1&rft.spage=31&rft.epage=46&rft.pages=31-46&rft.issn=2255-8950&rft.eissn=2255-8950&rft_id=info:doi/10.22364/bjmc.2019.7.1.03&rft_dat=%3Cproquest_cross%3E2209951535%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2209951535&rft_id=info:pmid/&rfr_iscdi=true |