Context-based Transfer Learning for Structuring Fault Localization and Program Repair Automation

Automated software debugging plays a crucial role in aiding software developers to swiftly identify and attempt to rectify faults, thereby significantly reducing developers’ workload. Previous researches have predominantly relied on simplistic semantic deep learning or statistical analysis methods t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on software engineering and methodology 2024-11
Hauptverfasser: Zhang, Lehuan, Guo, Shikai, Guo, Yi, Li, Hui, Chai, Yu, Chen, Rong, Li, Xiaochen, Jiang, He
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automated software debugging plays a crucial role in aiding software developers to swiftly identify and attempt to rectify faults, thereby significantly reducing developers’ workload. Previous researches have predominantly relied on simplistic semantic deep learning or statistical analysis methods to locate faulty statements in diverse projects. However, code repositories often consist of lengthy sequences with long-distance dependencies, posing challenges for accurately modeling fault localization using these methods. In addition, the lack of joint reasoning among various faults prevents existing models from deeply capturing fault information. To address these challenges, we propose a method named CodeHealer to achieve accurate fault localization and program repair. CodeHealer comprises three components: a Deep Semantic Information Extraction Component that effectively extracts deep semantic features from suspicious code statements using classifiers based on Joint-attention mechanisms; a Suspicious Statement Ranking Component that combines various fault localization features and employs multilayer perceptrons to derive multidimensional vectors of suspicion values; and a Fault Repair Component that, based on ranked suspicious statements generated by fault localization, adopts a top-down approach using multiple classifiers based on Co-teaching mechanisms to select repair templates and generate patches. The experimental results indicate that when applied to fault localization, CodeHealer outperforms the best baseline method with improvements of 11.4%, 2.7%, and 1.6% on Top-1/3/5 metrics, respectively. It also reduces the MFR and MAR by 9.8% and 2.1%, where lower values denote better fault localization effectiveness. Additionally, in automated software debugging, CodeHealer fixes an additional 6 faults compared to the current best method, totaling 53 faults repaired.
ISSN:1049-331X
1557-7392
DOI:10.1145/3705302