Studying the Impact of Noises in Build Breakage Data
Much research has investigated the common reasons for build breakages. However, prior research has paid little attention to builds that may break due to reasons that are unlikely to be related to development activities. For example, Continuous Integration (CI) builds may break due to timeout or conn...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on software engineering 2021-09, Vol.47 (9), p.1998-2011 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Much research has investigated the common reasons for build breakages. However, prior research has paid little attention to builds that may break due to reasons that are unlikely to be related to development activities. For example, Continuous Integration (CI) builds may break due to timeout or connection errors while generating the build. Such kinds of build breakages potentially introduce noises to build breakage data. Not considering such noises may lead to misleading results when studying CI builds. In this paper, we propose three criteria to identify build breakages that can potentially introduce noises to build breakage data. We apply these criteria to a dataset of 350,246 builds from 153 GitHub projects that are linked with Travis CI . Our results reveal that 33 percent of the build breakages are due to environmental factors (e.g., errors in CI servers), 29 percent are due to (unfixed) errors in previous builds, and 9 percent are due to build jobs that were later deemed by developers as noisy (there is an overlap of 17 percent between these three types of breakages). We measure the impact of noises in build breakage data on modeling build breakages. We observe that models that use uncleaned build breakage data can lead to misleading associations between build breakages and development activities (e.g., the role of developer). However, such associations could not be observed after eliminating noisy build breakages. Moreover, we replicate a prior study that investigates the association between build breakages and development activities using data from 14 GitHub projects. We observe that some observations reported by the prior study (e.g., pull requests cause more breakages) do not hold after eliminating the noises from build breakage data. |
---|---|
ISSN: | 0098-5589 1939-3520 |
DOI: | 10.1109/TSE.2019.2941880 |