Exploring the Robustness of Large Language Models for Solving Programming Problems
Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However, the extent to which LLMs understand problem descriptions and...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Shirafuji, Atsushi Watanobe, Yutaka Ito, Takumi Morishita, Makoto Nakamura, Yuki Oda, Yusuke Suzuki, Jun |
description | Using large language models (LLMs) for source code has recently gained
attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have
been shown to be highly capable of solving a wide range of programming
problems. However, the extent to which LLMs understand problem descriptions and
generate programs accordingly or just retrieve source code from the most
relevant problem in training data based on superficial cues has not been
discovered yet. To explore this research question, we conduct experiments to
understand the robustness of several popular LLMs, CodeGen and GPT-3.5 series
models, capable of tackling code generation tasks in introductory programming
problems. Our experimental results show that CodeGen and Codex are sensitive to
the superficial modifications of problem descriptions and significantly impact
code generation performance. Furthermore, we observe that Codex relies on
variable names, as randomized variables decrease the solved rate significantly.
However, the state-of-the-art (SOTA) models, such as InstructGPT and ChatGPT,
show higher robustness to superficial modifications and have an outstanding
capability for solving programming problems. This highlights the fact that
slight modifications to the prompts given to the LLMs can greatly affect code
generation performance, and careful formatting of prompts is essential for
high-quality code generation, while the SOTA models are becoming more robust to
perturbations. |
doi_str_mv | 10.48550/arxiv.2306.14583 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_14583</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_14583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-a3d6d4ea8c7835437aa37ad89221c3c3d1d756a2915417b1783c584d5ebbfd3f3</originalsourceid><addsrcrecordid>eNotj01uwjAQhb1hUQEH6Kq-QNI4Y8dmWSFKK6VqRdlH49hOIzkxsgHR2zdQFu9n8Wakj5BHVuRcCVE8Y7z057yEosoZFwoeyG5zOfgQ-7Gjxx9Ld0Gf0nG0KdHgaI2xs5OP3Qmn8hGM9Ym6EOl38OfrzVcMXcRhuHft7ZAWZObQJ7u855zsXzf79VtWf27f1y91hpWEDMFUhltUrVQgOEjESUatypK10IJhRooKyxUTnEnNplUrFDfCau0MOJiTp_-3N6jmEPsB429zhWtucPAHE6RKhQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploring the Robustness of Large Language Models for Solving Programming Problems</title><source>arXiv.org</source><creator>Shirafuji, Atsushi ; Watanobe, Yutaka ; Ito, Takumi ; Morishita, Makoto ; Nakamura, Yuki ; Oda, Yusuke ; Suzuki, Jun</creator><creatorcontrib>Shirafuji, Atsushi ; Watanobe, Yutaka ; Ito, Takumi ; Morishita, Makoto ; Nakamura, Yuki ; Oda, Yusuke ; Suzuki, Jun</creatorcontrib><description>Using large language models (LLMs) for source code has recently gained
attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have
been shown to be highly capable of solving a wide range of programming
problems. However, the extent to which LLMs understand problem descriptions and
generate programs accordingly or just retrieve source code from the most
relevant problem in training data based on superficial cues has not been
discovered yet. To explore this research question, we conduct experiments to
understand the robustness of several popular LLMs, CodeGen and GPT-3.5 series
models, capable of tackling code generation tasks in introductory programming
problems. Our experimental results show that CodeGen and Codex are sensitive to
the superficial modifications of problem descriptions and significantly impact
code generation performance. Furthermore, we observe that Codex relies on
variable names, as randomized variables decrease the solved rate significantly.
However, the state-of-the-art (SOTA) models, such as InstructGPT and ChatGPT,
show higher robustness to superficial modifications and have an outstanding
capability for solving programming problems. This highlights the fact that
slight modifications to the prompts given to the LLMs can greatly affect code
generation performance, and careful formatting of prompts is essential for
high-quality code generation, while the SOTA models are becoming more robust to
perturbations.</description><identifier>DOI: 10.48550/arxiv.2306.14583</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Software Engineering</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.14583$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.14583$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shirafuji, Atsushi</creatorcontrib><creatorcontrib>Watanobe, Yutaka</creatorcontrib><creatorcontrib>Ito, Takumi</creatorcontrib><creatorcontrib>Morishita, Makoto</creatorcontrib><creatorcontrib>Nakamura, Yuki</creatorcontrib><creatorcontrib>Oda, Yusuke</creatorcontrib><creatorcontrib>Suzuki, Jun</creatorcontrib><title>Exploring the Robustness of Large Language Models for Solving Programming Problems</title><description>Using large language models (LLMs) for source code has recently gained
attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have
been shown to be highly capable of solving a wide range of programming
problems. However, the extent to which LLMs understand problem descriptions and
generate programs accordingly or just retrieve source code from the most
relevant problem in training data based on superficial cues has not been
discovered yet. To explore this research question, we conduct experiments to
understand the robustness of several popular LLMs, CodeGen and GPT-3.5 series
models, capable of tackling code generation tasks in introductory programming
problems. Our experimental results show that CodeGen and Codex are sensitive to
the superficial modifications of problem descriptions and significantly impact
code generation performance. Furthermore, we observe that Codex relies on
variable names, as randomized variables decrease the solved rate significantly.
However, the state-of-the-art (SOTA) models, such as InstructGPT and ChatGPT,
show higher robustness to superficial modifications and have an outstanding
capability for solving programming problems. This highlights the fact that
slight modifications to the prompts given to the LLMs can greatly affect code
generation performance, and careful formatting of prompts is essential for
high-quality code generation, while the SOTA models are becoming more robust to
perturbations.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01uwjAQhb1hUQEH6Kq-QNI4Y8dmWSFKK6VqRdlH49hOIzkxsgHR2zdQFu9n8Wakj5BHVuRcCVE8Y7z057yEosoZFwoeyG5zOfgQ-7Gjxx9Ld0Gf0nG0KdHgaI2xs5OP3Qmn8hGM9Ym6EOl38OfrzVcMXcRhuHft7ZAWZObQJ7u855zsXzf79VtWf27f1y91hpWEDMFUhltUrVQgOEjESUatypK10IJhRooKyxUTnEnNplUrFDfCau0MOJiTp_-3N6jmEPsB429zhWtucPAHE6RKhQ</recordid><startdate>20230626</startdate><enddate>20230626</enddate><creator>Shirafuji, Atsushi</creator><creator>Watanobe, Yutaka</creator><creator>Ito, Takumi</creator><creator>Morishita, Makoto</creator><creator>Nakamura, Yuki</creator><creator>Oda, Yusuke</creator><creator>Suzuki, Jun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230626</creationdate><title>Exploring the Robustness of Large Language Models for Solving Programming Problems</title><author>Shirafuji, Atsushi ; Watanobe, Yutaka ; Ito, Takumi ; Morishita, Makoto ; Nakamura, Yuki ; Oda, Yusuke ; Suzuki, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-a3d6d4ea8c7835437aa37ad89221c3c3d1d756a2915417b1783c584d5ebbfd3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Shirafuji, Atsushi</creatorcontrib><creatorcontrib>Watanobe, Yutaka</creatorcontrib><creatorcontrib>Ito, Takumi</creatorcontrib><creatorcontrib>Morishita, Makoto</creatorcontrib><creatorcontrib>Nakamura, Yuki</creatorcontrib><creatorcontrib>Oda, Yusuke</creatorcontrib><creatorcontrib>Suzuki, Jun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shirafuji, Atsushi</au><au>Watanobe, Yutaka</au><au>Ito, Takumi</au><au>Morishita, Makoto</au><au>Nakamura, Yuki</au><au>Oda, Yusuke</au><au>Suzuki, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring the Robustness of Large Language Models for Solving Programming Problems</atitle><date>2023-06-26</date><risdate>2023</risdate><abstract>Using large language models (LLMs) for source code has recently gained
attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have
been shown to be highly capable of solving a wide range of programming
problems. However, the extent to which LLMs understand problem descriptions and
generate programs accordingly or just retrieve source code from the most
relevant problem in training data based on superficial cues has not been
discovered yet. To explore this research question, we conduct experiments to
understand the robustness of several popular LLMs, CodeGen and GPT-3.5 series
models, capable of tackling code generation tasks in introductory programming
problems. Our experimental results show that CodeGen and Codex are sensitive to
the superficial modifications of problem descriptions and significantly impact
code generation performance. Furthermore, we observe that Codex relies on
variable names, as randomized variables decrease the solved rate significantly.
However, the state-of-the-art (SOTA) models, such as InstructGPT and ChatGPT,
show higher robustness to superficial modifications and have an outstanding
capability for solving programming problems. This highlights the fact that
slight modifications to the prompts given to the LLMs can greatly affect code
generation performance, and careful formatting of prompts is essential for
high-quality code generation, while the SOTA models are becoming more robust to
perturbations.</abstract><doi>10.48550/arxiv.2306.14583</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2306.14583 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2306_14583 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Software Engineering |
title | Exploring the Robustness of Large Language Models for Solving Programming Problems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T18%3A18%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20the%20Robustness%20of%20Large%20Language%20Models%20for%20Solving%20Programming%20Problems&rft.au=Shirafuji,%20Atsushi&rft.date=2023-06-26&rft_id=info:doi/10.48550/arxiv.2306.14583&rft_dat=%3Carxiv_GOX%3E2306_14583%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |