Graph Based Mining of Code Change Patterns From Version Control Commits

Detailed knowledge of frequently recurring code changes can be beneficial for a variety of software engineering activities. For example, it is a key step to understand the process of software evolution, but is also necessary when developing more sophisticated code completion features predicting like...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2022-03, Vol.48 (3), p.848-863
Hauptverfasser: Janke, Mario, Mader, Patrick
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detailed knowledge of frequently recurring code changes can be beneficial for a variety of software engineering activities. For example, it is a key step to understand the process of software evolution, but is also necessary when developing more sophisticated code completion features predicting likely changes. Previous attempts on automatically finding such code change patterns were mainly based on frequent itemset mining, which essentially finds sets of edits occurring in close proximity. However, these approaches do not analyze the interplay among code elements, e.g., two code objects being named similarly, and thereby neglect great potential in identifying a number of meaningful patterns. We present a novel method for the automated mining of code change patterns from Git repositories that captures these context relations between individual edits. Our approach relies on a transformation of source code into a graph representation, while keeping relevant relations present. We then apply graph mining techniques to extract frequent subgraphs, which can be used for further analysis of development projects. We suggest multiple usage scenarios for the resulting pattern type. Additionally, we propose a transformation into complex event processing (CEP) rules which allows for easier application, especially for event-based auto-completion recommenders or similar tools. For evaluation, we mined seven open-source code repositories. We present 25 frequent change patterns occurring across these projects. We found these patterns to be meaningful, easy to interpret and mostly persistent across project borders. On average, a pattern from our set appeared in 45 percent of the analyzed code changes.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2020.3004892