Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization

Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different proj...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser:	Kaliraj, S, Kishoore, A M, Sivakumar, V
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Class Imbalance Classifiers Cross-Project Analysis Datasets Fault detection Fault diagnosis Impact analysis Machine learning Machine Learning Classifiers Measurement Model Generalization Performance measurement Performance Metrics Performance prediction Prediction models Predictive models Questions Reliability analysis Robustness Software Software engineering Software Fault Prediction Software reliability Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume	12
creator	Kaliraj, S Kishoore, A M Sivakumar, V
description	Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.
doi_str_mv	10.1109/ACCESS.2024.3397494
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_3053299181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10521510</ieee_id><doaj_id>oai_doaj_org_article_e173c47356da4e5991645e958fa92c54</doaj_id><sourcerecordid>3053299181</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</originalsourceid><addsrcrecordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053299181</pqid></control><display><type>article</type><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creator><creatorcontrib>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creatorcontrib><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3397494</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Class Imbalance ; Classifiers ; Cross-Project Analysis ; Datasets ; Fault detection ; Fault diagnosis ; Impact analysis ; Machine learning ; Machine Learning Classifiers ; Measurement ; Model Generalization ; Performance measurement ; Performance Metrics ; Performance prediction ; Prediction models ; Predictive models ; Questions ; Reliability analysis ; Robustness ; Software ; Software engineering ; Software Fault Prediction ; Software reliability ; Training</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</cites><orcidid>0000-0001-9910-726X ; 0000-0003-4212-8427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10521510$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><title>IEEE access</title><addtitle>Access</addtitle><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><subject>Accuracy</subject><subject>Class Imbalance</subject><subject>Classifiers</subject><subject>Cross-Project Analysis</subject><subject>Datasets</subject><subject>Fault detection</subject><subject>Fault diagnosis</subject><subject>Impact analysis</subject><subject>Machine learning</subject><subject>Machine Learning Classifiers</subject><subject>Measurement</subject><subject>Model Generalization</subject><subject>Performance measurement</subject><subject>Performance Metrics</subject><subject>Performance prediction</subject><subject>Prediction models</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Reliability analysis</subject><subject>Robustness</subject><subject>Software</subject><subject>Software engineering</subject><subject>Software Fault Prediction</subject><subject>Software reliability</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Kaliraj, S</creator><creator>Kishoore, A M</creator><creator>Sivakumar, V</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid></search><sort><creationdate>20240101</creationdate><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><author>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Class Imbalance</topic><topic>Classifiers</topic><topic>Cross-Project Analysis</topic><topic>Datasets</topic><topic>Fault detection</topic><topic>Fault diagnosis</topic><topic>Impact analysis</topic><topic>Machine learning</topic><topic>Machine Learning Classifiers</topic><topic>Measurement</topic><topic>Model Generalization</topic><topic>Performance measurement</topic><topic>Performance Metrics</topic><topic>Performance prediction</topic><topic>Prediction models</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Reliability analysis</topic><topic>Robustness</topic><topic>Software</topic><topic>Software engineering</topic><topic>Software Fault Prediction</topic><topic>Software reliability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaliraj, S</au><au>Kishoore, A M</au><au>Sivakumar, V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3397494</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024-01, Vol.12, p.1-1
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_3053299181
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Accuracy Class Imbalance Classifiers Cross-Project Analysis Datasets Fault detection Fault diagnosis Impact analysis Machine learning Machine Learning Classifiers Measurement Model Generalization Performance measurement Performance Metrics Performance prediction Prediction models Predictive models Questions Reliability analysis Robustness Software Software engineering Software Fault Prediction Software reliability Training
title	Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A35%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Software%20Fault%20Prediction%20using%20Cross-Project%20Analysis:%20A%20Study%20on%20Class%20Imbalance%20and%20Model%20Generalization&rft.jtitle=IEEE%20access&rft.au=Kaliraj,%20S&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3397494&rft_dat=%3Cproquest_ieee_%3E3053299181%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3053299181&rft_id=info:pmid/&rft_ieee_id=10521510&rft_doaj_id=oai_doaj_org_article_e173c47356da4e5991645e958fa92c54&rfr_iscdi=true