The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs

The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2015-12, Vol.41 (12), p.1236-1256
Hauptverfasser: Le Goues, Claire, Holtschulte, Neal, Smith, Edward K., Brun, Yuriy, Devanbu, Premkumar, Forrest, Stephanie, Weimer, Westley
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1256
container_issue 12
container_start_page 1236
container_title IEEE transactions on software engineering
container_volume 41
creator Le Goues, Claire
Holtschulte, Neal
Smith, Edward K.
Brun, Yuriy
Devanbu, Premkumar
Forrest, Stephanie
Weimer, Westley
description The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.
doi_str_mv 10.1109/TSE.2015.2454513
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1786210228</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7153570</ieee_id><sourcerecordid>3936541631</sourcerecordid><originalsourceid>FETCH-LOGICAL-c432t-a70fccb2cd828a9f82a10674ddde24668e7be2592f1b7cdfe3091ceb46f9a8153</originalsourceid><addsrcrecordid>eNpdkLFOwzAURS0EEqWwI7FYYmFJsZ04tsc2KlCpCARljhznuU1J4mInQ_-eVK0YmN5y7tV9B6FbSiaUEvW4-pxPGKF8whKecBqfoRFVsYpizsg5GhGiZMS5VJfoKoQtIYQLwUdosdoAftXtftavA9ZtiRdt511W6xDwDFqzabT_Dtg6j6d95xrdQYk_YKcrj53FGX73bu11E67RhdV1gJvTHaOvp_kqe4mWb8-LbLqMTBKzLtKCWGMKZkrJpFZWMk1JKpKyLIElaSpBFMC4YpYWwpQWYqKogSJJrdKS8niMHo69O-9-eghd3lTBQF3rFlwfcipkyihhTA7o_T9063rfDusGKqWUMpbGA0WOlPEuBA823_lq-HqfU5If3OaD2_zgNj-5HSJ3x0gFAH-4GNZxQeJfYchz2g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1761112263</pqid></control><display><type>article</type><title>The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs</title><source>IEEE Electronic Library (IEL)</source><creator>Le Goues, Claire ; Holtschulte, Neal ; Smith, Edward K. ; Brun, Yuriy ; Devanbu, Premkumar ; Forrest, Stephanie ; Weimer, Westley</creator><creatorcontrib>Le Goues, Claire ; Holtschulte, Neal ; Smith, Edward K. ; Brun, Yuriy ; Devanbu, Premkumar ; Forrest, Stephanie ; Weimer, Westley</creatorcontrib><description>The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2015.2454513</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>&lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;IntroClass ; &lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;ManyBugs ; Algorithms ; Automated program repair ; Automation ; benchmark ; Benchmark testing ; Benchmarks ; C (programming language) ; C language ; Categories ; Computer bugs ; Computer programs ; Computer science ; Datasets ; Debugging ; Defects ; Electronic mail ; Maintenance engineering ; Repair ; Reproducibility ; Software ; Software systems ; Studies ; subject defect</subject><ispartof>IEEE transactions on software engineering, 2015-12, Vol.41 (12), p.1236-1256</ispartof><rights>Copyright IEEE Computer Society Dec 1, 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c432t-a70fccb2cd828a9f82a10674ddde24668e7be2592f1b7cdfe3091ceb46f9a8153</citedby><cites>FETCH-LOGICAL-c432t-a70fccb2cd828a9f82a10674ddde24668e7be2592f1b7cdfe3091ceb46f9a8153</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7153570$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7153570$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Le Goues, Claire</creatorcontrib><creatorcontrib>Holtschulte, Neal</creatorcontrib><creatorcontrib>Smith, Edward K.</creatorcontrib><creatorcontrib>Brun, Yuriy</creatorcontrib><creatorcontrib>Devanbu, Premkumar</creatorcontrib><creatorcontrib>Forrest, Stephanie</creatorcontrib><creatorcontrib>Weimer, Westley</creatorcontrib><title>The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.</description><subject>&lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;IntroClass</subject><subject>&lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;ManyBugs</subject><subject>Algorithms</subject><subject>Automated program repair</subject><subject>Automation</subject><subject>benchmark</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>C (programming language)</subject><subject>C language</subject><subject>Categories</subject><subject>Computer bugs</subject><subject>Computer programs</subject><subject>Computer science</subject><subject>Datasets</subject><subject>Debugging</subject><subject>Defects</subject><subject>Electronic mail</subject><subject>Maintenance engineering</subject><subject>Repair</subject><subject>Reproducibility</subject><subject>Software</subject><subject>Software systems</subject><subject>Studies</subject><subject>subject defect</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkLFOwzAURS0EEqWwI7FYYmFJsZ04tsc2KlCpCARljhznuU1J4mInQ_-eVK0YmN5y7tV9B6FbSiaUEvW4-pxPGKF8whKecBqfoRFVsYpizsg5GhGiZMS5VJfoKoQtIYQLwUdosdoAftXtftavA9ZtiRdt511W6xDwDFqzabT_Dtg6j6d95xrdQYk_YKcrj53FGX73bu11E67RhdV1gJvTHaOvp_kqe4mWb8-LbLqMTBKzLtKCWGMKZkrJpFZWMk1JKpKyLIElaSpBFMC4YpYWwpQWYqKogSJJrdKS8niMHo69O-9-eghd3lTBQF3rFlwfcipkyihhTA7o_T9063rfDusGKqWUMpbGA0WOlPEuBA823_lq-HqfU5If3OaD2_zgNj-5HSJ3x0gFAH-4GNZxQeJfYchz2g</recordid><startdate>20151201</startdate><enddate>20151201</enddate><creator>Le Goues, Claire</creator><creator>Holtschulte, Neal</creator><creator>Smith, Edward K.</creator><creator>Brun, Yuriy</creator><creator>Devanbu, Premkumar</creator><creator>Forrest, Stephanie</creator><creator>Weimer, Westley</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151201</creationdate><title>The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs</title><author>Le Goues, Claire ; Holtschulte, Neal ; Smith, Edward K. ; Brun, Yuriy ; Devanbu, Premkumar ; Forrest, Stephanie ; Weimer, Westley</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c432t-a70fccb2cd828a9f82a10674ddde24668e7be2592f1b7cdfe3091ceb46f9a8153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>&lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;IntroClass</topic><topic>&lt;sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;ManyBugs</topic><topic>Algorithms</topic><topic>Automated program repair</topic><topic>Automation</topic><topic>benchmark</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>C (programming language)</topic><topic>C language</topic><topic>Categories</topic><topic>Computer bugs</topic><topic>Computer programs</topic><topic>Computer science</topic><topic>Datasets</topic><topic>Debugging</topic><topic>Defects</topic><topic>Electronic mail</topic><topic>Maintenance engineering</topic><topic>Repair</topic><topic>Reproducibility</topic><topic>Software</topic><topic>Software systems</topic><topic>Studies</topic><topic>subject defect</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Le Goues, Claire</creatorcontrib><creatorcontrib>Holtschulte, Neal</creatorcontrib><creatorcontrib>Smith, Edward K.</creatorcontrib><creatorcontrib>Brun, Yuriy</creatorcontrib><creatorcontrib>Devanbu, Premkumar</creatorcontrib><creatorcontrib>Forrest, Stephanie</creatorcontrib><creatorcontrib>Weimer, Westley</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Le Goues, Claire</au><au>Holtschulte, Neal</au><au>Smith, Edward K.</au><au>Brun, Yuriy</au><au>Devanbu, Premkumar</au><au>Forrest, Stephanie</au><au>Weimer, Westley</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2015-12-01</date><risdate>2015</risdate><volume>41</volume><issue>12</issue><spage>1236</spage><epage>1256</epage><pages>1236-1256</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2015.2454513</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0098-5589
ispartof IEEE transactions on software engineering, 2015-12, Vol.41 (12), p.1236-1256
issn 0098-5589
1939-3520
language eng
recordid cdi_proquest_miscellaneous_1786210228
source IEEE Electronic Library (IEL)
subjects <sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">IntroClass
<sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">ManyBugs
Algorithms
Automated program repair
Automation
benchmark
Benchmark testing
Benchmarks
C (programming language)
C language
Categories
Computer bugs
Computer programs
Computer science
Datasets
Debugging
Defects
Electronic mail
Maintenance engineering
Repair
Reproducibility
Software
Software systems
Studies
subject defect
title The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T17%3A38%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20ManyBugs%20and%20IntroClass%20Benchmarks%20for%20Automated%20Repair%20of%20C%20Programs&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Le%20Goues,%20Claire&rft.date=2015-12-01&rft.volume=41&rft.issue=12&rft.spage=1236&rft.epage=1256&rft.pages=1236-1256&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2015.2454513&rft_dat=%3Cproquest_RIE%3E3936541631%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1761112263&rft_id=info:pmid/&rft_ieee_id=7153570&rfr_iscdi=true