Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics

Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To addre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Baltic Journal of Modern Computing 2019, Vol.7 (1), p.31-46
1. Verfasser: Öztürk, Muhammed Maruf
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 46
container_issue 1
container_start_page 31
container_title Baltic Journal of Modern Computing
container_volume 7
creator Öztürk, Muhammed Maruf
description Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.
doi_str_mv 10.22364/bjmc.2019.7.1.03
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2209951535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209951535</sourcerecordid><originalsourceid>FETCH-LOGICAL-c268t-75775f6c40162fff446bcbc00d0a54ac48a33502b26548cec75b5314c7743d0e3</originalsourceid><addsrcrecordid>eNpNkE1LAzEURYMoWGp_gLuA6xlfviYzSxmqFioKKrgLmTeJpLSdmqRi_72tVXD17oPDvXAIuWRQci4qed0tVlhyYE2pS1aCOCEjzpUq6kbB6b98TiYpLQCAqVrwGkbkbZZoO6w2S_cV8q7obHI9bZfblF0M63c6ePoUB3Qp0QeXY8BEbaJT7x3m8OkOT1jT52xzwH1R7_6wC3Lm7TK5ye8dk9fb6Ut7X8wf72btzbxAXtW50Epr5SuUwCruvZey6rBDgB6skhZlbYVQwDteKVmjQ606JZhEraXowYkxuTr2buLwsXUpm8Wwjev9pOEcmkYxJdSeYkcK45BSdN5sYljZuDMMzI9Dc3BoDg6NNsyAEN9uFGQ6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2209951535</pqid></control><display><type>article</type><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Öztürk, Muhammed Maruf</creator><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><description>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</description><identifier>ISSN: 2255-8950</identifier><identifier>ISSN: 2255-8942</identifier><identifier>EISSN: 2255-8950</identifier><identifier>DOI: 10.22364/bjmc.2019.7.1.03</identifier><language>eng</language><publisher>Riga: University of Latvia</publisher><subject>Clustering ; Complexity ; Correlation analysis ; Correlation coefficients ; Data mining ; Datasets ; Defects ; Fuzzy systems ; International conferences ; Knowledge management ; Methods ; Researchers ; Software engineering ; Software quality ; Software reliability ; Training</subject><ispartof>Baltic Journal of Modern Computing, 2019, Vol.7 (1), p.31-46</ispartof><rights>2019. This work is published under https://creativecommons.org/licenses/by-sa/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><title>Baltic Journal of Modern Computing</title><description>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</description><subject>Clustering</subject><subject>Complexity</subject><subject>Correlation analysis</subject><subject>Correlation coefficients</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Defects</subject><subject>Fuzzy systems</subject><subject>International conferences</subject><subject>Knowledge management</subject><subject>Methods</subject><subject>Researchers</subject><subject>Software engineering</subject><subject>Software quality</subject><subject>Software reliability</subject><subject>Training</subject><issn>2255-8950</issn><issn>2255-8942</issn><issn>2255-8950</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpNkE1LAzEURYMoWGp_gLuA6xlfviYzSxmqFioKKrgLmTeJpLSdmqRi_72tVXD17oPDvXAIuWRQci4qed0tVlhyYE2pS1aCOCEjzpUq6kbB6b98TiYpLQCAqVrwGkbkbZZoO6w2S_cV8q7obHI9bZfblF0M63c6ePoUB3Qp0QeXY8BEbaJT7x3m8OkOT1jT52xzwH1R7_6wC3Lm7TK5ye8dk9fb6Ut7X8wf72btzbxAXtW50Epr5SuUwCruvZey6rBDgB6skhZlbYVQwDteKVmjQ606JZhEraXowYkxuTr2buLwsXUpm8Wwjev9pOEcmkYxJdSeYkcK45BSdN5sYljZuDMMzI9Dc3BoDg6NNsyAEN9uFGQ6</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Öztürk, Muhammed Maruf</creator><general>University of Latvia</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2019</creationdate><title>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</title><author>Öztürk, Muhammed Maruf</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c268t-75775f6c40162fff446bcbc00d0a54ac48a33502b26548cec75b5314c7743d0e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Clustering</topic><topic>Complexity</topic><topic>Correlation analysis</topic><topic>Correlation coefficients</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Defects</topic><topic>Fuzzy systems</topic><topic>International conferences</topic><topic>Knowledge management</topic><topic>Methods</topic><topic>Researchers</topic><topic>Software engineering</topic><topic>Software quality</topic><topic>Software reliability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Öztürk, Muhammed Maruf</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Baltic Journal of Modern Computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Öztürk, Muhammed Maruf</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics</atitle><jtitle>Baltic Journal of Modern Computing</jtitle><date>2019</date><risdate>2019</risdate><volume>7</volume><issue>1</issue><spage>31</spage><epage>46</epage><pages>31-46</pages><issn>2255-8950</issn><issn>2255-8942</issn><eissn>2255-8950</eissn><abstract>Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.</abstract><cop>Riga</cop><pub>University of Latvia</pub><doi>10.22364/bjmc.2019.7.1.03</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2255-8950
ispartof Baltic Journal of Modern Computing, 2019, Vol.7 (1), p.31-46
issn 2255-8950
2255-8942
2255-8950
language eng
recordid cdi_proquest_journals_2209951535
source DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Clustering
Complexity
Correlation analysis
Correlation coefficients
Data mining
Datasets
Defects
Fuzzy systems
International conferences
Knowledge management
Methods
Researchers
Software engineering
Software quality
Software reliability
Training
title Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T02%3A40%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Is%20Complexity-based%20Clustering%20of%20Process%20Metrics%20as%20Effective%20as%20in%20Static%20Code%20Metrics&rft.jtitle=Baltic%20Journal%20of%20Modern%20Computing&rft.au=%C3%96zt%C3%BCrk,%20Muhammed%20Maruf&rft.date=2019&rft.volume=7&rft.issue=1&rft.spage=31&rft.epage=46&rft.pages=31-46&rft.issn=2255-8950&rft.eissn=2255-8950&rft_id=info:doi/10.22364/bjmc.2019.7.1.03&rft_dat=%3Cproquest_cross%3E2209951535%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2209951535&rft_id=info:pmid/&rfr_iscdi=true