Graph Based Mining of Code Change Patterns From Version Control Commits
Detailed knowledge of frequently recurring code changes can be beneficial for a variety of software engineering activities. For example, it is a key step to understand the process of software evolution, but is also necessary when developing more sophisticated code completion features predicting like...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on software engineering 2022-03, Vol.48 (3), p.848-863 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Detailed knowledge of frequently recurring code changes can be beneficial for a variety of software engineering activities. For example, it is a key step to understand the process of software evolution, but is also necessary when developing more sophisticated code completion features predicting likely changes. Previous attempts on automatically finding such code change patterns were mainly based on frequent itemset mining, which essentially finds sets of edits occurring in close proximity. However, these approaches do not analyze the interplay among code elements, e.g., two code objects being named similarly, and thereby neglect great potential in identifying a number of meaningful patterns. We present a novel method for the automated mining of code change patterns from Git repositories that captures these context relations between individual edits. Our approach relies on a transformation of source code into a graph representation, while keeping relevant relations present. We then apply graph mining techniques to extract frequent subgraphs, which can be used for further analysis of development projects. We suggest multiple usage scenarios for the resulting pattern type. Additionally, we propose a transformation into complex event processing (CEP) rules which allows for easier application, especially for event-based auto-completion recommenders or similar tools. For evaluation, we mined seven open-source code repositories. We present 25 frequent change patterns occurring across these projects. We found these patterns to be meaningful, easy to interpret and mostly persistent across project borders. On average, a pattern from our set appeared in 45 percent of the analyzed code changes. |
---|---|
ISSN: | 0098-5589 1939-3520 |
DOI: | 10.1109/TSE.2020.3004892 |