Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural la...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Source code summarization aims to generate natural language summaries from
structured code snippets for better understanding code functionalities.
However, automatic code summarization is challenging due to the complexity of
the source code and the language gap between the source code and natural
language summaries. Most previous approaches either rely on retrieval-based
(which can take advantage of similar examples seen from the retrieval database,
but have low generalization performance) or generation-based methods (which
have better generalization performance, but cannot take advantage of similar
examples). This paper proposes a novel retrieval-augmented mechanism to combine
the benefits of both worlds. Furthermore, to mitigate the limitation of Graph
Neural Networks (GNNs) on capturing global graph structure information of
source code, we propose a novel attention-based dynamic graph to complement the
static graph representation of the source code, and design a hybrid message
passing GNN for capturing both the local and global structural information. To
evaluate the proposed approach, we release a new challenging benchmark, crawled
from diversified large-scale open-source C projects (total 95k+ unique
functions in the dataset). Our method achieves the state-of-the-art
performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of
BLEU-4, ROUGE-L and METEOR. |
---|---|
DOI: | 10.48550/arxiv.2006.05405 |