T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax

Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates ( i.e. , different programs that trigger the same bug) can be minimized or...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on software engineering and methodology 2024-08
Hauptverfasser:	Xu, Zhenyang, Tian, Yongqiang, Zhang, Mengxiao, Zhang, Jiarui, Liu, Puzhuo, Jiang, Yu, Sun, Chengnian
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates ( i.e. , different programs that trigger the same bug) can be minimized or even eliminated. With such reduction and canonicalization functionality, program reduction facilitates debugging for software, especially language toolchains, such as compilers, interpreters, and debuggers. While many program reduction techniques have been proposed, most of them (especially the language-agnostic ones) overlooked the potential reduction opportunities hidden within tokens. Therefore, their capabilities in terms of reduction and canonicalization are significantly restricted. To fill this gap, we propose T-Rec, a fine-grained language-agnostic program reduction technique guided by lexical syntax. Instead of treating tokens as atomic and irreducible components, T-Rec introduces a fine-grained reduction process that leverages the lexical syntax of programming languages to effectively explore the reduction opportunities in tokens. Through comprehensive evaluations with versatile benchmark suites, we demonstrate that T-Rec significantly improves the reduction and canonicalization capability of two existing language-agnostic program reducers ( i.e. , Perses and Vulcan). T-Rec enables Perses and Vulcan to further eliminate 1,294 and 1,315 duplicates in a benchmark suite that contains 3,796 test cases that triggers 46 unique bugs. Additionally, T-Rec can also reduce up to 65.52% and 53.73% bytes in the results of Perses and Vulcan on our multi-lingual benchmark suite, respectively.
ISSN:	1049-331X 1557-7392
DOI:	10.1145/3690631