Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization

Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different proj...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser: Kaliraj, S, Kishoore, A M, Sivakumar, V
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 12
creator Kaliraj, S
Kishoore, A M
Sivakumar, V
description Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.
doi_str_mv 10.1109/ACCESS.2024.3397494
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_3053299181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10521510</ieee_id><doaj_id>oai_doaj_org_article_e173c47356da4e5991645e958fa92c54</doaj_id><sourcerecordid>3053299181</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</originalsourceid><addsrcrecordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053299181</pqid></control><display><type>article</type><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creator><creatorcontrib>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creatorcontrib><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3397494</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Class Imbalance ; Classifiers ; Cross-Project Analysis ; Datasets ; Fault detection ; Fault diagnosis ; Impact analysis ; Machine learning ; Machine Learning Classifiers ; Measurement ; Model Generalization ; Performance measurement ; Performance Metrics ; Performance prediction ; Prediction models ; Predictive models ; Questions ; Reliability analysis ; Robustness ; Software ; Software engineering ; Software Fault Prediction ; Software reliability ; Training</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</cites><orcidid>0000-0001-9910-726X ; 0000-0003-4212-8427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10521510$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><title>IEEE access</title><addtitle>Access</addtitle><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><subject>Accuracy</subject><subject>Class Imbalance</subject><subject>Classifiers</subject><subject>Cross-Project Analysis</subject><subject>Datasets</subject><subject>Fault detection</subject><subject>Fault diagnosis</subject><subject>Impact analysis</subject><subject>Machine learning</subject><subject>Machine Learning Classifiers</subject><subject>Measurement</subject><subject>Model Generalization</subject><subject>Performance measurement</subject><subject>Performance Metrics</subject><subject>Performance prediction</subject><subject>Prediction models</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Reliability analysis</subject><subject>Robustness</subject><subject>Software</subject><subject>Software engineering</subject><subject>Software Fault Prediction</subject><subject>Software reliability</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Kaliraj, S</creator><creator>Kishoore, A M</creator><creator>Sivakumar, V</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid></search><sort><creationdate>20240101</creationdate><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><author>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Class Imbalance</topic><topic>Classifiers</topic><topic>Cross-Project Analysis</topic><topic>Datasets</topic><topic>Fault detection</topic><topic>Fault diagnosis</topic><topic>Impact analysis</topic><topic>Machine learning</topic><topic>Machine Learning Classifiers</topic><topic>Measurement</topic><topic>Model Generalization</topic><topic>Performance measurement</topic><topic>Performance Metrics</topic><topic>Performance prediction</topic><topic>Prediction models</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Reliability analysis</topic><topic>Robustness</topic><topic>Software</topic><topic>Software engineering</topic><topic>Software Fault Prediction</topic><topic>Software reliability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaliraj, S</au><au>Kishoore, A M</au><au>Sivakumar, V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3397494</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024-01, Vol.12, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_3053299181
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Accuracy
Class Imbalance
Classifiers
Cross-Project Analysis
Datasets
Fault detection
Fault diagnosis
Impact analysis
Machine learning
Machine Learning Classifiers
Measurement
Model Generalization
Performance measurement
Performance Metrics
Performance prediction
Prediction models
Predictive models
Questions
Reliability analysis
Robustness
Software
Software engineering
Software Fault Prediction
Software reliability
Training
title Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A35%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Software%20Fault%20Prediction%20using%20Cross-Project%20Analysis:%20A%20Study%20on%20Class%20Imbalance%20and%20Model%20Generalization&rft.jtitle=IEEE%20access&rft.au=Kaliraj,%20S&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3397494&rft_dat=%3Cproquest_ieee_%3E3053299181%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3053299181&rft_id=info:pmid/&rft_ieee_id=10521510&rft_doaj_id=oai_doaj_org_article_e173c47356da4e5991645e958fa92c54&rfr_iscdi=true