Towards automatically generating block comments for code snippets

Code commenting is a common programming practice of practical importance to help developers review and comprehend source code. There are two main types of code comments for a method: header comments that summarize the method functionality located before a method, and block comments that describe the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and software technology 2020-11, Vol.127, p.106373, Article 106373
Hauptverfasser: Huang, Yuan, Huang, Shaohao, Chen, Huanchao, Chen, Xiangping, Zheng, Zibin, Luo, Xiapu, Jia, Nan, Hu, Xinyu, Zhou, Xiaocong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Code commenting is a common programming practice of practical importance to help developers review and comprehend source code. There are two main types of code comments for a method: header comments that summarize the method functionality located before a method, and block comments that describe the functionality of the code snippets within a method. Inspired by the effectiveness of deep learning techniques in the NLP field, many studies focus on using the machine translation model to automatically generate comment for the source code. Because the data set of block comments is difficult to collect, current studies focus more on the automatic generation of header comments than that of block comments. However, block comments are important for program comprehension due to their explanation role for the code snippets in a method. To fill the gap, we have proposed an approach that combines heuristic rules and learning-based method to collect a large number of comment-code pairs from 1,032 open source projects in our previous study. In this paper, we propose a reinforcement learning-based method, RL-BlockCom, to automatically generate block comments for code snippets based on the collected comment-code pairs. Specifically, we utilize the abstract syntax tree (i.e., AST) of a code snippet to generate a token sequence with a statement-based traversal way. Then we propose a composite learning model, which combines the actor-critic algorithm of reinforcement learning with the encoder-decoder algorithm, to generate block comments. On the data set of the comment-code pairs, the BLEU-4 score of our method is 24.28, which outperforms the baselines and state-of-the-art in comment generation.
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2020.106373