A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction

Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowle...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematical problems in engineering 2018-01, Vol.2018 (2018), p.1-13
Hauptverfasser:	Ji, Haijin, Huang, Song
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Classifiers Clustering Computer science Datasets Defects Efficiency Evaluation Experimentation International conferences Kolmogorov-Smirnov test Open source software Performance evaluation Preprocessing Random noise Researchers Software engineering Software quality Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	13
container_issue	2018
container_start_page	1
container_title	Mathematical problems in engineering
container_volume	2018
creator	Ji, Haijin Huang, Song
description	Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.
doi_str_mv	10.1155/2018/9616938
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2105003595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105003595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-8ddc5a10ee3538bd3e3002c8bedc6bb5f12731e1b9a5df6b3b83063e2823373</originalsourceid><addsrcrecordid>eNqF0M1LwzAYBvAgCs7pzbMEPGpdPpY2PY7OqTA_YB68lbR5Mzu7ZiYdxf_elA48ekpCfsn78CB0SckdpUJMGKFyksY0Trk8QiMqYh4JOk2Ow56waUQZ_zhFZ95vCGFUUDlC6xl-gQ4vnNpCZ90XzmzjK9-CxtbguWoVfnOwc7YE76tmjVWjcVarcDAVOPxsNdR1f2Gswytr2k45wHMwULb9U12VbWWbc3RiVO3h4rCO0Wpx_549RsvXh6dstoxKHpM2klqXQlECwAWXhebAQ9RSFqDLuCiEoSzhFGiRKqFNXPBCchJzYJJxnvAxuh5-DYG_9-DbfGP3rgkDc0aJIISLVAR1O6jSWe8dmHznqq1yPzkleV9k3heZH4oM_Gbgn1WjVVf9p68GDcGAUX-akTQJ6X8B9DZ8xw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2105003595</pqid></control><display><type>article</type><title>A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction</title><source>Wiley Online Library Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Ji, Haijin ; Huang, Song</creator><contributor>Lu, Helen ; Helen Lu</contributor><creatorcontrib>Ji, Haijin ; Huang, Song ; Lu, Helen ; Helen Lu</creatorcontrib><description>Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.</description><identifier>ISSN: 1024-123X</identifier><identifier>EISSN: 1563-5147</identifier><identifier>DOI: 10.1155/2018/9616938</identifier><language>eng</language><publisher>Cairo, Egypt: Hindawi Publishing Corporation</publisher><subject>Algorithms ; Artificial intelligence ; Classifiers ; Clustering ; Computer science ; Datasets ; Defects ; Efficiency ; Evaluation ; Experimentation ; International conferences ; Kolmogorov-Smirnov test ; Open source software ; Performance evaluation ; Preprocessing ; Random noise ; Researchers ; Software engineering ; Software quality ; Source code</subject><ispartof>Mathematical problems in engineering, 2018-01, Vol.2018 (2018), p.1-13</ispartof><rights>Copyright © 2018 Haijin Ji and Song Huang.</rights><rights>Copyright © 2018 Haijin Ji and Song Huang. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-8ddc5a10ee3538bd3e3002c8bedc6bb5f12731e1b9a5df6b3b83063e2823373</citedby><cites>FETCH-LOGICAL-c360t-8ddc5a10ee3538bd3e3002c8bedc6bb5f12731e1b9a5df6b3b83063e2823373</cites><orcidid>0000-0003-3624-7673 ; 0000-0002-6894-3916</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><contributor>Lu, Helen</contributor><contributor>Helen Lu</contributor><creatorcontrib>Ji, Haijin</creatorcontrib><creatorcontrib>Huang, Song</creatorcontrib><title>A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction</title><title>Mathematical problems in engineering</title><description>Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Classifiers</subject><subject>Clustering</subject><subject>Computer science</subject><subject>Datasets</subject><subject>Defects</subject><subject>Efficiency</subject><subject>Evaluation</subject><subject>Experimentation</subject><subject>International conferences</subject><subject>Kolmogorov-Smirnov test</subject><subject>Open source software</subject><subject>Performance evaluation</subject><subject>Preprocessing</subject><subject>Random noise</subject><subject>Researchers</subject><subject>Software engineering</subject><subject>Software quality</subject><subject>Source code</subject><issn>1024-123X</issn><issn>1563-5147</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNqF0M1LwzAYBvAgCs7pzbMEPGpdPpY2PY7OqTA_YB68lbR5Mzu7ZiYdxf_elA48ekpCfsn78CB0SckdpUJMGKFyksY0Trk8QiMqYh4JOk2Ow56waUQZ_zhFZ95vCGFUUDlC6xl-gQ4vnNpCZ90XzmzjK9-CxtbguWoVfnOwc7YE76tmjVWjcVarcDAVOPxsNdR1f2Gswytr2k45wHMwULb9U12VbWWbc3RiVO3h4rCO0Wpx_549RsvXh6dstoxKHpM2klqXQlECwAWXhebAQ9RSFqDLuCiEoSzhFGiRKqFNXPBCchJzYJJxnvAxuh5-DYG_9-DbfGP3rgkDc0aJIISLVAR1O6jSWe8dmHznqq1yPzkleV9k3heZH4oM_Gbgn1WjVVf9p68GDcGAUX-akTQJ6X8B9DZ8xw</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Ji, Haijin</creator><creator>Huang, Song</creator><general>Hindawi Publishing Corporation</general><general>Hindawi</general><general>Hindawi Limited</general><scope>ADJCN</scope><scope>AHFXO</scope><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0003-3624-7673</orcidid><orcidid>https://orcid.org/0000-0002-6894-3916</orcidid></search><sort><creationdate>20180101</creationdate><title>A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction</title><author>Ji, Haijin ; Huang, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-8ddc5a10ee3538bd3e3002c8bedc6bb5f12731e1b9a5df6b3b83063e2823373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Classifiers</topic><topic>Clustering</topic><topic>Computer science</topic><topic>Datasets</topic><topic>Defects</topic><topic>Efficiency</topic><topic>Evaluation</topic><topic>Experimentation</topic><topic>International conferences</topic><topic>Kolmogorov-Smirnov test</topic><topic>Open source software</topic><topic>Performance evaluation</topic><topic>Preprocessing</topic><topic>Random noise</topic><topic>Researchers</topic><topic>Software engineering</topic><topic>Software quality</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ji, Haijin</creatorcontrib><creatorcontrib>Huang, Song</creatorcontrib><collection>الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals</collection><collection>معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete</collection><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access Journals</collection><collection>CrossRef</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Mathematical problems in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ji, Haijin</au><au>Huang, Song</au><au>Lu, Helen</au><au>Helen Lu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction</atitle><jtitle>Mathematical problems in engineering</jtitle><date>2018-01-01</date><risdate>2018</risdate><volume>2018</volume><issue>2018</issue><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>1024-123X</issn><eissn>1563-5147</eissn><abstract>Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.</abstract><cop>Cairo, Egypt</cop><pub>Hindawi Publishing Corporation</pub><doi>10.1155/2018/9616938</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3624-7673</orcidid><orcidid>https://orcid.org/0000-0002-6894-3916</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1024-123X
ispartof	Mathematical problems in engineering, 2018-01, Vol.2018 (2018), p.1-13
issn	1024-123X 1563-5147
language	eng
recordid	cdi_proquest_journals_2105003595
source	Wiley Online Library Open Access; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection
subjects	Algorithms Artificial intelligence Classifiers Clustering Computer science Datasets Defects Efficiency Evaluation Experimentation International conferences Kolmogorov-Smirnov test Open source software Performance evaluation Preprocessing Random noise Researchers Software engineering Software quality Source code
title	A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T12%3A58%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20New%20Framework%20Consisted%20of%20Data%20Preprocessing%20and%20Classifier%20Modelling%20for%20Software%20Defect%20Prediction&rft.jtitle=Mathematical%20problems%20in%20engineering&rft.au=Ji,%20Haijin&rft.date=2018-01-01&rft.volume=2018&rft.issue=2018&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=1024-123X&rft.eissn=1563-5147&rft_id=info:doi/10.1155/2018/9616938&rft_dat=%3Cproquest_cross%3E2105003595%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2105003595&rft_id=info:pmid/&rfr_iscdi=true