Controlling Overfitting in Classification-Tree Models of Software Quality
Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmo...
Gespeichert in:
Veröffentlicht in: | Empirical software engineering : an international journal 2001-03, Vol.6 (1), p.59-79 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 79 |
---|---|
container_issue | 1 |
container_start_page | 59 |
container_title | Empirical software engineering : an international journal |
container_volume | 6 |
creator | Khoshgoftaar, Taghi M Allen, Edward B |
description | Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.[PUBLICATION ABSTRACT] |
doi_str_mv | 10.1023/A:1009803004576 |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_26657328</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671234131</sourcerecordid><originalsourceid>FETCH-LOGICAL-c205t-dcedc66b9aecfc4eac2f61b239ceed2cb6005f862f0fe472626eb07d95e804f73</originalsourceid><addsrcrecordid>eNp9z0tLxDAUBeAgCo6ja7fFhbip5nnTupPiY2BkEMf1kKY3kiE22qSK_96Krly4OmfxceAQcszoOaNcXFxdMkrrigpKpdKwQ2ZMaVFqYLA7dVHxUnAF--QgpS2dqJZqRhZN7PMQQ_D9c7F6x8H5nL-774smmJS889ZkH_tyPSAW97HDkIroisfo8ocZsHgYTfD585DsORMSHv3mnDzdXK-bu3K5ul00V8vScqpy2VnsLEBbG7TOSjSWO2AtF7VF7LhtgVLlKuCOOpSaAwdsqe5qhRWVTos5Of3ZfR3i24gpb158shiC6TGOacMBpt-8muDZv5CBZlxIJthET_7QbRyHfrqxqXQNquISxBcWd2rA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>879658246</pqid></control><display><type>article</type><title>Controlling Overfitting in Classification-Tree Models of Software Quality</title><source>SpringerLink Journals - AutoHoldings</source><creator>Khoshgoftaar, Taghi M ; Allen, Edward B</creator><creatorcontrib>Khoshgoftaar, Taghi M ; Allen, Edward B</creatorcontrib><description>Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.[PUBLICATION ABSTRACT]</description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1023/A:1009803004576</identifier><language>eng</language><publisher>Dordrecht: Springer Nature B.V</publisher><subject>Algorithms ; Classification ; Computer programs ; Construction ; Mathematical models ; Modules ; Software ; Software quality ; Studies ; Training</subject><ispartof>Empirical software engineering : an international journal, 2001-03, Vol.6 (1), p.59-79</ispartof><rights>Kluwer Academic Publishers 2001</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c205t-dcedc66b9aecfc4eac2f61b239ceed2cb6005f862f0fe472626eb07d95e804f73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Khoshgoftaar, Taghi M</creatorcontrib><creatorcontrib>Allen, Edward B</creatorcontrib><title>Controlling Overfitting in Classification-Tree Models of Software Quality</title><title>Empirical software engineering : an international journal</title><description>Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.[PUBLICATION ABSTRACT]</description><subject>Algorithms</subject><subject>Classification</subject><subject>Computer programs</subject><subject>Construction</subject><subject>Mathematical models</subject><subject>Modules</subject><subject>Software</subject><subject>Software quality</subject><subject>Studies</subject><subject>Training</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNp9z0tLxDAUBeAgCo6ja7fFhbip5nnTupPiY2BkEMf1kKY3kiE22qSK_96Krly4OmfxceAQcszoOaNcXFxdMkrrigpKpdKwQ2ZMaVFqYLA7dVHxUnAF--QgpS2dqJZqRhZN7PMQQ_D9c7F6x8H5nL-774smmJS889ZkH_tyPSAW97HDkIroisfo8ocZsHgYTfD585DsORMSHv3mnDzdXK-bu3K5ul00V8vScqpy2VnsLEBbG7TOSjSWO2AtF7VF7LhtgVLlKuCOOpSaAwdsqe5qhRWVTos5Of3ZfR3i24gpb158shiC6TGOacMBpt-8muDZv5CBZlxIJthET_7QbRyHfrqxqXQNquISxBcWd2rA</recordid><startdate>20010301</startdate><enddate>20010301</enddate><creator>Khoshgoftaar, Taghi M</creator><creator>Allen, Edward B</creator><general>Springer Nature B.V</general><scope>7SC</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>S0W</scope></search><sort><creationdate>20010301</creationdate><title>Controlling Overfitting in Classification-Tree Models of Software Quality</title><author>Khoshgoftaar, Taghi M ; Allen, Edward B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c205t-dcedc66b9aecfc4eac2f61b239ceed2cb6005f862f0fe472626eb07d95e804f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Computer programs</topic><topic>Construction</topic><topic>Mathematical models</topic><topic>Modules</topic><topic>Software</topic><topic>Software quality</topic><topic>Studies</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khoshgoftaar, Taghi M</creatorcontrib><creatorcontrib>Allen, Edward B</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>DELNET Engineering & Technology Collection</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khoshgoftaar, Taghi M</au><au>Allen, Edward B</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Controlling Overfitting in Classification-Tree Models of Software Quality</atitle><jtitle>Empirical software engineering : an international journal</jtitle><date>2001-03-01</date><risdate>2001</risdate><volume>6</volume><issue>1</issue><spage>59</spage><epage>79</epage><pages>59-79</pages><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract>Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.[PUBLICATION ABSTRACT]</abstract><cop>Dordrecht</cop><pub>Springer Nature B.V</pub><doi>10.1023/A:1009803004576</doi><tpages>21</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1382-3256 |
ispartof | Empirical software engineering : an international journal, 2001-03, Vol.6 (1), p.59-79 |
issn | 1382-3256 1573-7616 |
language | eng |
recordid | cdi_proquest_miscellaneous_26657328 |
source | SpringerLink Journals - AutoHoldings |
subjects | Algorithms Classification Computer programs Construction Mathematical models Modules Software Software quality Studies Training |
title | Controlling Overfitting in Classification-Tree Models of Software Quality |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T19%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Controlling%20Overfitting%20in%20Classification-Tree%20Models%20of%20Software%20Quality&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Khoshgoftaar,%20Taghi%20M&rft.date=2001-03-01&rft.volume=6&rft.issue=1&rft.spage=59&rft.epage=79&rft.pages=59-79&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1023/A:1009803004576&rft_dat=%3Cproquest%3E1671234131%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=879658246&rft_id=info:pmid/&rfr_iscdi=true |