Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the prop...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The advent of large pre-trained language models in the domain of Code
Synthesis has shown remarkable performance on various benchmarks, treating the
problem of Code Generation in a fashion similar to Natural Language Generation,
trained with a Language Modelling (LM) objective. In addition, the property of
programming language code being precisely evaluable with respect to its
semantics -- through the use of Unit Tests to check its functional correctness
-- lends itself to using Reinforcement Learning (RL) as a further training
paradigm. Previous work has shown that RL can be applied as such to improve
models' coding capabilities; however, such RL-based methods rely on a reward
signal based on defined Unit Tests, which are much harder to obtain compared to
the huge crawled code datasets used in LM objectives. In this work, we present
a novel approach to automatically obtain data consisting of function signatures
and associated Unit Tests, suitable for RL training of Code Synthesis models.
We also introduce a straightforward, simple yet effective Actor-Critic RL
training scheme and show that it, in conjunction with automatically generated
training data, leads to improvement of a pre-trained code language model's
performance by up to 9.9% improvement over the original underlying code
synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or
CodeRL. |
---|---|
DOI: | 10.48550/arxiv.2310.13669 |