FS3change: A Scalable Method for Change Pattern Mining

Mining change patterns can give unique understanding on the evolution of dynamically changing systems like social relation graphs, weblinks, hardware descriptions and models. A more recent focus is source code change pattern mining that may qualitatively justify expected or uncover unexpected patter...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2023-06, Vol.49 (6), p.3616-3629
Hauptverfasser: Janke, Mario, Mader, Patrick
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mining change patterns can give unique understanding on the evolution of dynamically changing systems like social relation graphs, weblinks, hardware descriptions and models. A more recent focus is source code change pattern mining that may qualitatively justify expected or uncover unexpected patterns. These patterns then offer a basis, e.g., for program language evolution or auto-completion support. We present a change pattern mining method that greatly expands the limits of input data and pattern complexity, over existing methods. We propose scalability solutions on conceptual and algorithmic level, thereby evolving the state-of-the-art sampling-based frequent subgraph mining method FS 3 , resulting in 75% reduction in memory consumption and a speedup of 6500 for a large scale dataset. Patterns can have 100,000 s of occurrences for which manual review is impossible and may lead to misinterpretation. We propose the novel content track approach for interactively exploring pattern contents in context, based on marginal distributions. We evaluate our approach by mining 1,000 open source projects contributing a total of 558 million changes and 2 billion contextual connections among them, thereby, demonstrating its scalability. A manual interpretation of 19 patterns shows sensible mined patterns allowing to deduct implications for language design and demonstrating the soundness of the approach.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2023.3269500