Integrating Non-Fourier and AST-Structural Relative Position Representations Into Transformer-Based Model for Source Code Summarization

Source code summaries play a crucial role in helping programmers comprehend the behavior of source code functions. In recent deep-learning based approaches for Source Code Summarization, there has been a growing focus on Transformer-based models. These models use self-attention mechanisms to overcom...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.9871-9889
Hauptverfasser: Liang, Hsiang-Mei, Huang, Chin-Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Source code summaries play a crucial role in helping programmers comprehend the behavior of source code functions. In recent deep-learning based approaches for Source Code Summarization, there has been a growing focus on Transformer-based models. These models use self-attention mechanisms to overcome the long-range dependency issue that previous models often encounter, making them a promising solution for the Source Code Summarization task. However, these models suffer from two shortcomings: 1) they are weak in handling the semantics of keywords, and 2) they are weak to learn the source code with complex structure. To resolve these shortcomings, our study proposes integrating Non-Fourier and ASTStructural relative position representations into Transformer-based model for Source Code Summarization, which we have named NFASRPR-TRANS. NFASRPR-TRANS employs two types of positional encoding schemes in two different Transformer encoders. The first encoder handles the semantics of the keywords of the input source code sequence by using the Gaussian Embedder to encode the non-Fourier relative position representation of the sequence. The second encoder uses Tree Positional Encoding to learn the structural information of the Abstract Syntax Trees (ASTs), which provides relative position information in the ASTs for generating the source code summaries. Finally,we compared NFASRPR-TRANS with previous models and evaluated its performance on the Java and Python datasets using five metrics, including BLEU, ROUGE-L, CIDEr, METEOR, and SPICE. NFASRPR-TRANS achieves 2%-10% improvements across all five metrics on both datasets.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3354390