Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization
Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different proj...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024-01, Vol.12, p.1-1 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE access |
container_volume | 12 |
creator | Kaliraj, S Kishoore, A M Sivakumar, V |
description | Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges. |
doi_str_mv | 10.1109/ACCESS.2024.3397494 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_3053299181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10521510</ieee_id><doaj_id>oai_doaj_org_article_e173c47356da4e5991645e958fa92c54</doaj_id><sourcerecordid>3053299181</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</originalsourceid><addsrcrecordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053299181</pqid></control><display><type>article</type><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creator><creatorcontrib>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</creatorcontrib><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3397494</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Class Imbalance ; Classifiers ; Cross-Project Analysis ; Datasets ; Fault detection ; Fault diagnosis ; Impact analysis ; Machine learning ; Machine Learning Classifiers ; Measurement ; Model Generalization ; Performance measurement ; Performance Metrics ; Performance prediction ; Prediction models ; Predictive models ; Questions ; Reliability analysis ; Robustness ; Software ; Software engineering ; Software Fault Prediction ; Software reliability ; Training</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</cites><orcidid>0000-0001-9910-726X ; 0000-0003-4212-8427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10521510$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><title>IEEE access</title><addtitle>Access</addtitle><description>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</description><subject>Accuracy</subject><subject>Class Imbalance</subject><subject>Classifiers</subject><subject>Cross-Project Analysis</subject><subject>Datasets</subject><subject>Fault detection</subject><subject>Fault diagnosis</subject><subject>Impact analysis</subject><subject>Machine learning</subject><subject>Machine Learning Classifiers</subject><subject>Measurement</subject><subject>Model Generalization</subject><subject>Performance measurement</subject><subject>Performance Metrics</subject><subject>Performance prediction</subject><subject>Prediction models</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Reliability analysis</subject><subject>Robustness</subject><subject>Software</subject><subject>Software engineering</subject><subject>Software Fault Prediction</subject><subject>Software reliability</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r20AQFKGBmMS_oH046LOc-5aub0YkqcEhBifPx-q0CucquvROori_PnIUQvZll2FmFmay7DujK8aouV5X1c1-v-KUy5UQppBGnmULzrTJhRL625f7IlumdKDTlBOkikX2Zx_a4R9EJLcwdgPZRWy8G3zoyZh8_0yqGFLKdzEc0A1k3UN3TD79ImuyH8bmSCZi1UFKZPNSQwe9QwJ9Q-5Dgx25wx4jdP4_nByvsvMWuoTLj32ZPd3ePFa_8-3D3aZab3MnlBly0SpTtpxj3TophS4YMtkUotbclFiDkhy0kNRw7WhRUlkWrkbVaIbStWjEZbaZfZsAB_sa_QvEow3g7TsQ4rOFOHjXoUVWCCcLoXQDEpUxTEuFRpUtGO6UnLx-zl6vMfwdMQ32EMY4pZCsoErwSVGyiSVmljulFbH9_MqoPZVk55LsqST7UdKk-jGrPCJ-USjOFKPiDZ9qjLs</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Kaliraj, S</creator><creator>Kishoore, A M</creator><creator>Sivakumar, V</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid></search><sort><creationdate>20240101</creationdate><title>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</title><author>Kaliraj, S ; Kishoore, A M ; Sivakumar, V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-3f598f22ebfc443671e14d73b6298eba542a6340926c0780487cbe5d61e4cfe93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Class Imbalance</topic><topic>Classifiers</topic><topic>Cross-Project Analysis</topic><topic>Datasets</topic><topic>Fault detection</topic><topic>Fault diagnosis</topic><topic>Impact analysis</topic><topic>Machine learning</topic><topic>Machine Learning Classifiers</topic><topic>Measurement</topic><topic>Model Generalization</topic><topic>Performance measurement</topic><topic>Performance Metrics</topic><topic>Performance prediction</topic><topic>Prediction models</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Reliability analysis</topic><topic>Robustness</topic><topic>Software</topic><topic>Software engineering</topic><topic>Software Fault Prediction</topic><topic>Software reliability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaliraj, S</creatorcontrib><creatorcontrib>Kishoore, A M</creatorcontrib><creatorcontrib>Sivakumar, V</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaliraj, S</au><au>Kishoore, A M</au><au>Sivakumar, V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3397494</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-9910-726X</orcidid><orcidid>https://orcid.org/0000-0003-4212-8427</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024-01, Vol.12, p.1-1 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_3053299181 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Accuracy Class Imbalance Classifiers Cross-Project Analysis Datasets Fault detection Fault diagnosis Impact analysis Machine learning Machine Learning Classifiers Measurement Model Generalization Performance measurement Performance Metrics Performance prediction Prediction models Predictive models Questions Reliability analysis Robustness Software Software engineering Software Fault Prediction Software reliability Training |
title | Software Fault Prediction using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A35%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Software%20Fault%20Prediction%20using%20Cross-Project%20Analysis:%20A%20Study%20on%20Class%20Imbalance%20and%20Model%20Generalization&rft.jtitle=IEEE%20access&rft.au=Kaliraj,%20S&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3397494&rft_dat=%3Cproquest_ieee_%3E3053299181%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3053299181&rft_id=info:pmid/&rft_ieee_id=10521510&rft_doaj_id=oai_doaj_org_article_e173c47356da4e5991645e958fa92c54&rfr_iscdi=true |