Early prediction of merged code changes to prioritize reviewing tasks

Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Empirical software engineering : an international journal 2018-12, Vol.23 (6), p.3346-3393
Hauptverfasser:	Fan, Yuanrui, Xia, Xin, Lo, David, Li, Shanping
Format:	Artikel
Sprache:	eng
Schlagworte:	Building codes Compilers Computer Science Feature extraction Interpreters Machine learning Mathematical models Programming Languages Reviewing Software Engineering/Programming and Operating Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3393
container_issue	6
container_start_page	3346
container_title	Empirical software engineering : an international journal
container_volume	23
creator	Fan, Yuanrui Xia, Xin Lo, David Li, Shanping
description	Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. ( 2009 ) and Gousios et al. ( 2014 ) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.
doi_str_mv	10.1007/s10664-018-9602-0
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2139133737</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2139133737</sourcerecordid><originalsourceid>FETCH-LOGICAL-c382t-8ea0164aec402e070e83d4bc9c13614fbbc50214e938febe05bc2cb31b97a7f53</originalsourceid><addsrcrecordid>eNp1kE9PwzAMxSMEEmPwAbhF4hywkzZpj2gaf6RJXOAcpak7OrZlJB0IPj2ZisQJ-WAf3nu2f4xdIlwjgLlJCFoXArAStQYp4IhNsDRKGI36OM-qkkLJUp-ys5RWAFCbopyw-dzF9RffRWp7P_Rhy0PHNxSX1HIfWuL-1W2XlPgQsqgPsR_6b-KRPnr67LdLPrj0ls7ZSefWiS5--5S93M2fZw9i8XT_OLtdCJ_XD6IiB6gLR74ASWCAKtUWja89Ko1F1zS-BIkF1arqqCEoGy99o7CpjTNdqabsaszdxfC-pzTYVdjHbV5pJaoalTK5pgxHlY8hpUidzZdvXPyyCPZAy460bKZlD7QsZI8cPSlr88PxL_l_0w_L5mz7</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2139133737</pqid></control><display><type>article</type><title>Early prediction of merged code changes to prioritize reviewing tasks</title><source>Springer Nature - Complete Springer Journals</source><creator>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Li, Shanping</creator><creatorcontrib>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Li, Shanping</creatorcontrib><description>Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. ( 2009 ) and Gousios et al. ( 2014 ) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.</description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1007/s10664-018-9602-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Building codes ; Compilers ; Computer Science ; Feature extraction ; Interpreters ; Machine learning ; Mathematical models ; Programming Languages ; Reviewing ; Software Engineering/Programming and Operating Systems</subject><ispartof>Empirical software engineering : an international journal, 2018-12, Vol.23 (6), p.3346-3393</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c382t-8ea0164aec402e070e83d4bc9c13614fbbc50214e938febe05bc2cb31b97a7f53</citedby><cites>FETCH-LOGICAL-c382t-8ea0164aec402e070e83d4bc9c13614fbbc50214e938febe05bc2cb31b97a7f53</cites><orcidid>0000-0002-6302-3256</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10664-018-9602-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10664-018-9602-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51298</link.rule.ids></links><search><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><title>Early prediction of merged code changes to prioritize reviewing tasks</title><title>Empirical software engineering : an international journal</title><addtitle>Empir Software Eng</addtitle><description>Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. ( 2009 ) and Gousios et al. ( 2014 ) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.</description><subject>Building codes</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Feature extraction</subject><subject>Interpreters</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Programming Languages</subject><subject>Reviewing</subject><subject>Software Engineering/Programming and Operating Systems</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kE9PwzAMxSMEEmPwAbhF4hywkzZpj2gaf6RJXOAcpak7OrZlJB0IPj2ZisQJ-WAf3nu2f4xdIlwjgLlJCFoXArAStQYp4IhNsDRKGI36OM-qkkLJUp-ys5RWAFCbopyw-dzF9RffRWp7P_Rhy0PHNxSX1HIfWuL-1W2XlPgQsqgPsR_6b-KRPnr67LdLPrj0ls7ZSefWiS5--5S93M2fZw9i8XT_OLtdCJ_XD6IiB6gLR74ASWCAKtUWja89Ko1F1zS-BIkF1arqqCEoGy99o7CpjTNdqabsaszdxfC-pzTYVdjHbV5pJaoalTK5pgxHlY8hpUidzZdvXPyyCPZAy460bKZlD7QsZI8cPSlr88PxL_l_0w_L5mz7</recordid><startdate>20181201</startdate><enddate>20181201</enddate><creator>Fan, Yuanrui</creator><creator>Xia, Xin</creator><creator>Lo, David</creator><creator>Li, Shanping</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid></search><sort><creationdate>20181201</creationdate><title>Early prediction of merged code changes to prioritize reviewing tasks</title><author>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Li, Shanping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c382t-8ea0164aec402e070e83d4bc9c13614fbbc50214e938febe05bc2cb31b97a7f53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Building codes</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Feature extraction</topic><topic>Interpreters</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Programming Languages</topic><topic>Reviewing</topic><topic>Software Engineering/Programming and Operating Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fan, Yuanrui</au><au>Xia, Xin</au><au>Lo, David</au><au>Li, Shanping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Early prediction of merged code changes to prioritize reviewing tasks</atitle><jtitle>Empirical software engineering : an international journal</jtitle><stitle>Empir Software Eng</stitle><date>2018-12-01</date><risdate>2018</risdate><volume>23</volume><issue>6</issue><spage>3346</spage><epage>3393</epage><pages>3346-3393</pages><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract>Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. ( 2009 ) and Gousios et al. ( 2014 ) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10664-018-9602-0</doi><tpages>48</tpages><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1382-3256
ispartof	Empirical software engineering : an international journal, 2018-12, Vol.23 (6), p.3346-3393
issn	1382-3256 1573-7616
language	eng
recordid	cdi_proquest_journals_2139133737
source	Springer Nature - Complete Springer Journals
subjects	Building codes Compilers Computer Science Feature extraction Interpreters Machine learning Mathematical models Programming Languages Reviewing Software Engineering/Programming and Operating Systems
title	Early prediction of merged code changes to prioritize reviewing tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T13%3A02%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Early%20prediction%20of%20merged%20code%20changes%20to%20prioritize%20reviewing%20tasks&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Fan,%20Yuanrui&rft.date=2018-12-01&rft.volume=23&rft.issue=6&rft.spage=3346&rft.epage=3393&rft.pages=3346-3393&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1007/s10664-018-9602-0&rft_dat=%3Cproquest_cross%3E2139133737%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2139133737&rft_id=info:pmid/&rfr_iscdi=true