Bug localization using latent Dirichlet allocation

Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and software technology 2010-09, Vol.52 (9), p.972-990
Hauptverfasser: Lukins, Stacy K., Kraft, Nicholas A., Etzkorn, Letha H.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 990
container_issue 9
container_start_page 972
container_title Information and software technology
container_volume 52
creator Lukins, Stacy K.
Kraft, Nicholas A.
Etzkorn, Letha H.
description Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.
doi_str_mv 10.1016/j.infsof.2010.04.002
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_760200524</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950584910000650</els_id><sourcerecordid>2069483781</sourcerecordid><originalsourceid>FETCH-LOGICAL-c464t-cf2138ab9da5865ec62dc22b17bc7ec1b754bfcfb1406b3737ffdbfaf1087e7c3</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AxfFjW5ab9I82o2g4xMG3Og6pGmiKZ1mTFpBf70p48qFqwuH7xy4H0KnGAoMmF92hRts9LYgkCKgBQDZQwtciTLnQNg-WkDNIGcVrQ_RUYwdABZQwgKRm-kt671WvftWo_NDNkU3pEiNZhizWxecfu_NmKl-pmbiGB1Y1Udz8nuX6PX-7mX1mK-fH55W1-tcU07HXFuCy0o1datYxZnRnLSakAaLRgujcSMYbay2DabAm1KUwtq2scpiqIQRulyi893uNviPycRRblzUpu_VYPwUpUivATBCE3nxL4kJSS4o51VCz_6gnZ_CkP6QjIhakIqQBNEdpIOPMRgrt8FtVPiSGORsXHZyZ1zOxiVQmYyn2tWuZpKVT2eCjNqZQZvWBaNH2Xr3_8AP2G2LBw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>527972822</pqid></control><display><type>article</type><title>Bug localization using latent Dirichlet allocation</title><source>Elsevier ScienceDirect Journals</source><creator>Lukins, Stacy K. ; Kraft, Nicholas A. ; Etzkorn, Letha H.</creator><creatorcontrib>Lukins, Stacy K. ; Kraft, Nicholas A. ; Etzkorn, Letha H.</creatorcontrib><description>Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.</description><identifier>ISSN: 0950-5849</identifier><identifier>EISSN: 1873-6025</identifier><identifier>DOI: 10.1016/j.infsof.2010.04.002</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Accuracy ; Bug localization ; Computer programs ; Debugging ; Dirichlet problem ; Information retrieval ; Large scale integration ; Latent Dirichlet allocation ; Localization ; Open source software ; Position (location) ; Program comprehension ; Software ; Source code ; Stability ; Statistical methods ; Studies</subject><ispartof>Information and software technology, 2010-09, Vol.52 (9), p.972-990</ispartof><rights>2010 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Sep 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c464t-cf2138ab9da5865ec62dc22b17bc7ec1b754bfcfb1406b3737ffdbfaf1087e7c3</citedby><cites>FETCH-LOGICAL-c464t-cf2138ab9da5865ec62dc22b17bc7ec1b754bfcfb1406b3737ffdbfaf1087e7c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.infsof.2010.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids></links><search><creatorcontrib>Lukins, Stacy K.</creatorcontrib><creatorcontrib>Kraft, Nicholas A.</creatorcontrib><creatorcontrib>Etzkorn, Letha H.</creatorcontrib><title>Bug localization using latent Dirichlet allocation</title><title>Information and software technology</title><description>Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.</description><subject>Accuracy</subject><subject>Bug localization</subject><subject>Computer programs</subject><subject>Debugging</subject><subject>Dirichlet problem</subject><subject>Information retrieval</subject><subject>Large scale integration</subject><subject>Latent Dirichlet allocation</subject><subject>Localization</subject><subject>Open source software</subject><subject>Position (location)</subject><subject>Program comprehension</subject><subject>Software</subject><subject>Source code</subject><subject>Stability</subject><subject>Statistical methods</subject><subject>Studies</subject><issn>0950-5849</issn><issn>1873-6025</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AxfFjW5ab9I82o2g4xMG3Og6pGmiKZ1mTFpBf70p48qFqwuH7xy4H0KnGAoMmF92hRts9LYgkCKgBQDZQwtciTLnQNg-WkDNIGcVrQ_RUYwdABZQwgKRm-kt671WvftWo_NDNkU3pEiNZhizWxecfu_NmKl-pmbiGB1Y1Udz8nuX6PX-7mX1mK-fH55W1-tcU07HXFuCy0o1datYxZnRnLSakAaLRgujcSMYbay2DabAm1KUwtq2scpiqIQRulyi893uNviPycRRblzUpu_VYPwUpUivATBCE3nxL4kJSS4o51VCz_6gnZ_CkP6QjIhakIqQBNEdpIOPMRgrt8FtVPiSGORsXHZyZ1zOxiVQmYyn2tWuZpKVT2eCjNqZQZvWBaNH2Xr3_8AP2G2LBw</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Lukins, Stacy K.</creator><creator>Kraft, Nicholas A.</creator><creator>Etzkorn, Letha H.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20100901</creationdate><title>Bug localization using latent Dirichlet allocation</title><author>Lukins, Stacy K. ; Kraft, Nicholas A. ; Etzkorn, Letha H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c464t-cf2138ab9da5865ec62dc22b17bc7ec1b754bfcfb1406b3737ffdbfaf1087e7c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Bug localization</topic><topic>Computer programs</topic><topic>Debugging</topic><topic>Dirichlet problem</topic><topic>Information retrieval</topic><topic>Large scale integration</topic><topic>Latent Dirichlet allocation</topic><topic>Localization</topic><topic>Open source software</topic><topic>Position (location)</topic><topic>Program comprehension</topic><topic>Software</topic><topic>Source code</topic><topic>Stability</topic><topic>Statistical methods</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lukins, Stacy K.</creatorcontrib><creatorcontrib>Kraft, Nicholas A.</creatorcontrib><creatorcontrib>Etzkorn, Letha H.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information and software technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lukins, Stacy K.</au><au>Kraft, Nicholas A.</au><au>Etzkorn, Letha H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bug localization using latent Dirichlet allocation</atitle><jtitle>Information and software technology</jtitle><date>2010-09-01</date><risdate>2010</risdate><volume>52</volume><issue>9</issue><spage>972</spage><epage>990</epage><pages>972-990</pages><issn>0950-5849</issn><eissn>1873-6025</eissn><abstract>Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.infsof.2010.04.002</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0950-5849
ispartof Information and software technology, 2010-09, Vol.52 (9), p.972-990
issn 0950-5849
1873-6025
language eng
recordid cdi_proquest_miscellaneous_760200524
source Elsevier ScienceDirect Journals
subjects Accuracy
Bug localization
Computer programs
Debugging
Dirichlet problem
Information retrieval
Large scale integration
Latent Dirichlet allocation
Localization
Open source software
Position (location)
Program comprehension
Software
Source code
Stability
Statistical methods
Studies
title Bug localization using latent Dirichlet allocation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T02%3A52%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bug%20localization%20using%20latent%20Dirichlet%20allocation&rft.jtitle=Information%20and%20software%20technology&rft.au=Lukins,%20Stacy%20K.&rft.date=2010-09-01&rft.volume=52&rft.issue=9&rft.spage=972&rft.epage=990&rft.pages=972-990&rft.issn=0950-5849&rft.eissn=1873-6025&rft_id=info:doi/10.1016/j.infsof.2010.04.002&rft_dat=%3Cproquest_cross%3E2069483781%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=527972822&rft_id=info:pmid/&rft_els_id=S0950584910000650&rfr_iscdi=true