Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel
The SZZ algorithm is used to connect bug-fixing commits to the earlier commits that introduced bugs. This algorithm has many applications and many variants have been devised. However, there are some types of commits that cannot be traced by the SZZ algorithm, referred to as "ghost commits"...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The SZZ algorithm is used to connect bug-fixing commits to the earlier
commits that introduced bugs. This algorithm has many applications and many
variants have been devised. However, there are some types of commits that
cannot be traced by the SZZ algorithm, referred to as "ghost commits". The
evaluation of how these ghost commits impact the SZZ algorithm remains limited.
Moreover, these algorithms have been evaluated on datasets created by software
engineering researchers from information in bug trackers and version controlled
histories. Since Oct 2013, the Linux kernel developers have started labelling
bug-fixing patches with the commit identifiers of the corresponding
bug-inducing commit(s) as a standard practice. As of v6.1-rc5, 76,046 pairs of
bug-fixing patches and bug-inducing commits are available. This provides a
unique opportunity to evaluate the SZZ algorithm on a large dataset that has
been created and reviewed by project developers, entirely independently of the
biases of software engineering researchers.
In this paper, we apply six SZZ algorithms to 76,046 pairs of bug-fixing
patches and bug-introducing commits from the Linux kernel. Our findings reveal
that SZZ algorithms experience a more significant decline in recall on our
dataset (13.8%) as compared to prior findings reported by Rosa et al., and the
disparities between the individual SZZ algorithms diminish. Moreover, we find
that 17.47% of bug-fixing commits are ghost commits. Finally, we propose
Tracing-Commit SZZ (TC-SZZ), that traces all commits in the change history of
lines modified or deleted in bug-fixing commits. Applying TC-SZZ to all failure
cases, excluding ghost commits, we found that TC-SZZ could identify 17.7% of
them. Our further analysis found that 34.6% of bug-inducing commits were in the
function history, 27.5% in the file history (but not in the function history),
and... |
---|---|
DOI: | 10.48550/arxiv.2308.05060 |