GAP-Gen: Guided Automatic Python Code Generation

Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Pyth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhao, Junchen, Song, Yurun, Wang, Junlin, Harris, Ian G
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhao, Junchen
Song, Yurun
Wang, Junlin
Harris, Ian G
description Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.
doi_str_mv 10.48550/arxiv.2201.08810
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2201_08810</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2201_08810</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-a14f08cada4bffcecc0f046df3975b1a7b6a2449eb6fbff1fc0eb806381e9b4d3</originalsourceid><addsrcrecordid>eNotjsGOgjAURbuZhXH8AFf2B8BXKKXMjhBlTEx04Z68tq-RZACDYPTvRcfVTW5OTg5jSwGh1EkCa-zv9S2MIhAhaC1gxqDMj0FJ7Q8vx9qR4_k4dA0OteXHx3DuWl50jvhEUD-9XfvNvjz-XWnx2Tk7bTen4jfYH8pdke8DVCkEKKQHbdGhNN5bshY8SOV8nKWJEZgahZGUGRnlJ0B4C2Q0qFgLyox08Zyt_rXv5OrS1w32j-qVXr3T4yfYyD5C</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GAP-Gen: Guided Automatic Python Code Generation</title><source>arXiv.org</source><creator>Zhao, Junchen ; Song, Yurun ; Wang, Junlin ; Harris, Ian G</creator><creatorcontrib>Zhao, Junchen ; Song, Yurun ; Wang, Junlin ; Harris, Ian G</creatorcontrib><description>Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.</description><identifier>DOI: 10.48550/arxiv.2201.08810</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning ; Computer Science - Programming Languages ; Computer Science - Software Engineering</subject><creationdate>2022-01</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2201.08810$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2201.08810$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Junchen</creatorcontrib><creatorcontrib>Song, Yurun</creatorcontrib><creatorcontrib>Wang, Junlin</creatorcontrib><creatorcontrib>Harris, Ian G</creatorcontrib><title>GAP-Gen: Guided Automatic Python Code Generation</title><description>Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Programming Languages</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjsGOgjAURbuZhXH8AFf2B8BXKKXMjhBlTEx04Z68tq-RZACDYPTvRcfVTW5OTg5jSwGh1EkCa-zv9S2MIhAhaC1gxqDMj0FJ7Q8vx9qR4_k4dA0OteXHx3DuWl50jvhEUD-9XfvNvjz-XWnx2Tk7bTen4jfYH8pdke8DVCkEKKQHbdGhNN5bshY8SOV8nKWJEZgahZGUGRnlJ0B4C2Q0qFgLyox08Zyt_rXv5OrS1w32j-qVXr3T4yfYyD5C</recordid><startdate>20220119</startdate><enddate>20220119</enddate><creator>Zhao, Junchen</creator><creator>Song, Yurun</creator><creator>Wang, Junlin</creator><creator>Harris, Ian G</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220119</creationdate><title>GAP-Gen: Guided Automatic Python Code Generation</title><author>Zhao, Junchen ; Song, Yurun ; Wang, Junlin ; Harris, Ian G</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-a14f08cada4bffcecc0f046df3975b1a7b6a2449eb6fbff1fc0eb806381e9b4d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Programming Languages</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Junchen</creatorcontrib><creatorcontrib>Song, Yurun</creatorcontrib><creatorcontrib>Wang, Junlin</creatorcontrib><creatorcontrib>Harris, Ian G</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Junchen</au><au>Song, Yurun</au><au>Wang, Junlin</au><au>Harris, Ian G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GAP-Gen: Guided Automatic Python Code Generation</atitle><date>2022-01-19</date><risdate>2022</risdate><abstract>Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.</abstract><doi>10.48550/arxiv.2201.08810</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2201.08810
ispartof
issn
language eng
recordid cdi_arxiv_primary_2201_08810
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Learning
Computer Science - Programming Languages
Computer Science - Software Engineering
title GAP-Gen: Guided Automatic Python Code Generation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A04%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GAP-Gen:%20Guided%20Automatic%20Python%20Code%20Generation&rft.au=Zhao,%20Junchen&rft.date=2022-01-19&rft_id=info:doi/10.48550/arxiv.2201.08810&rft_dat=%3Carxiv_GOX%3E2201_08810%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true