PanGu-Coder: Program Synthesis with Function-Level Language Modeling

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage empl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Christopoulou, Fenia, Lampouras, Gerasimos, Gritta, Milan, Zhang, Guchun, Guo, Yinpeng, Li, Zhongqi, Zhang, Qi, Xiao, Meng, Shen, Bo, Li, Lin, Yu, Hao, Yan, Li, Zhou, Pingyi, Wang, Xin, Ma, Yuchi, Iacobacci, Ignacio, Wang, Yasheng, Liang, Guangtai, Wei, Jiansheng, Jiang, Xin, Wang, Qianxiang, Liu, Qun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Programming Languages Computer Science - Software Engineering
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data.
DOI:	10.48550/arxiv.2207.11280