Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wei, Xiaokai, Gonugondla, Sujan, Ahmad, Wasi, Wang, Shiqi, Ray, Baishakhi, Qian, Haifeng, Li, Xiaopeng, Kumar, Varun, Wang, Zijian, Tian, Yuchen, Sun, Qing, Athiwaratkun, Ben, Shang, Mingyue, Ramanathan, Murali Krishna, Bhatia, Parminder, Xiang, Bing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wei, Xiaokai
Gonugondla, Sujan
Ahmad, Wasi
Wang, Shiqi
Ray, Baishakhi
Qian, Haifeng
Li, Xiaopeng
Kumar, Varun
Wang, Zijian
Tian, Yuchen
Sun, Qing
Athiwaratkun, Ben
Shang, Mingyue
Ramanathan, Murali Krishna
Bhatia, Parminder
Xiang, Bing
description ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.
doi_str_mv 10.48550/arxiv.2303.05378
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2303_05378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2303_05378</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-2d41d1fe7b4aedc499d1566ed79094fc48d4133c2d4d858bfe0ef0b66b92b9a53</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwKr-gQQnfsRmhyIIiCColH10HV8XS2mC3JRSvp40sBqN5mikQ8hNxlKhpWS3EL_DV5pzxlMmeaEvyUsVEQeM9IQTfR-PGP2hv6MN7MKwpTXELdJydEirMwVTGAf6Ovd-T49h-qCbAwxT-FmGK3Lhod_j9X-uSPP40JRPSf1WPZf3dQKq0EnuROYyj4UVgK4TxrhMKoWuMMwI3wk9A5x3M-e01NYjQ8-sUtbk1oDkK7L-u11s2s8YdhBP7dmqXaz4L5xgSDg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><source>arXiv.org</source><creator>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</creator><creatorcontrib>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</creatorcontrib><description>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</description><identifier>DOI: 10.48550/arxiv.2303.05378</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Software Engineering</subject><creationdate>2023-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2303.05378$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2303.05378$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wei, Xiaokai</creatorcontrib><creatorcontrib>Gonugondla, Sujan</creatorcontrib><creatorcontrib>Ahmad, Wasi</creatorcontrib><creatorcontrib>Wang, Shiqi</creatorcontrib><creatorcontrib>Ray, Baishakhi</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><creatorcontrib>Li, Xiaopeng</creatorcontrib><creatorcontrib>Kumar, Varun</creatorcontrib><creatorcontrib>Wang, Zijian</creatorcontrib><creatorcontrib>Tian, Yuchen</creatorcontrib><creatorcontrib>Sun, Qing</creatorcontrib><creatorcontrib>Athiwaratkun, Ben</creatorcontrib><creatorcontrib>Shang, Mingyue</creatorcontrib><creatorcontrib>Ramanathan, Murali Krishna</creatorcontrib><creatorcontrib>Bhatia, Parminder</creatorcontrib><creatorcontrib>Xiang, Bing</creatorcontrib><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><description>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwKr-gQQnfsRmhyIIiCColH10HV8XS2mC3JRSvp40sBqN5mikQ8hNxlKhpWS3EL_DV5pzxlMmeaEvyUsVEQeM9IQTfR-PGP2hv6MN7MKwpTXELdJydEirMwVTGAf6Ovd-T49h-qCbAwxT-FmGK3Lhod_j9X-uSPP40JRPSf1WPZf3dQKq0EnuROYyj4UVgK4TxrhMKoWuMMwI3wk9A5x3M-e01NYjQ8-sUtbk1oDkK7L-u11s2s8YdhBP7dmqXaz4L5xgSDg</recordid><startdate>20230309</startdate><enddate>20230309</enddate><creator>Wei, Xiaokai</creator><creator>Gonugondla, Sujan</creator><creator>Ahmad, Wasi</creator><creator>Wang, Shiqi</creator><creator>Ray, Baishakhi</creator><creator>Qian, Haifeng</creator><creator>Li, Xiaopeng</creator><creator>Kumar, Varun</creator><creator>Wang, Zijian</creator><creator>Tian, Yuchen</creator><creator>Sun, Qing</creator><creator>Athiwaratkun, Ben</creator><creator>Shang, Mingyue</creator><creator>Ramanathan, Murali Krishna</creator><creator>Bhatia, Parminder</creator><creator>Xiang, Bing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230309</creationdate><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><author>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-2d41d1fe7b4aedc499d1566ed79094fc48d4133c2d4d858bfe0ef0b66b92b9a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Xiaokai</creatorcontrib><creatorcontrib>Gonugondla, Sujan</creatorcontrib><creatorcontrib>Ahmad, Wasi</creatorcontrib><creatorcontrib>Wang, Shiqi</creatorcontrib><creatorcontrib>Ray, Baishakhi</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><creatorcontrib>Li, Xiaopeng</creatorcontrib><creatorcontrib>Kumar, Varun</creatorcontrib><creatorcontrib>Wang, Zijian</creatorcontrib><creatorcontrib>Tian, Yuchen</creatorcontrib><creatorcontrib>Sun, Qing</creatorcontrib><creatorcontrib>Athiwaratkun, Ben</creatorcontrib><creatorcontrib>Shang, Mingyue</creatorcontrib><creatorcontrib>Ramanathan, Murali Krishna</creatorcontrib><creatorcontrib>Bhatia, Parminder</creatorcontrib><creatorcontrib>Xiang, Bing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Xiaokai</au><au>Gonugondla, Sujan</au><au>Ahmad, Wasi</au><au>Wang, Shiqi</au><au>Ray, Baishakhi</au><au>Qian, Haifeng</au><au>Li, Xiaopeng</au><au>Kumar, Varun</au><au>Wang, Zijian</au><au>Tian, Yuchen</au><au>Sun, Qing</au><au>Athiwaratkun, Ben</au><au>Shang, Mingyue</au><au>Ramanathan, Murali Krishna</au><au>Bhatia, Parminder</au><au>Xiang, Bing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Greener yet Powerful: Taming Large Code Generation Models with Quantization</atitle><date>2023-03-09</date><risdate>2023</risdate><abstract>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</abstract><doi>10.48550/arxiv.2303.05378</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2303.05378
ispartof
issn
language eng
recordid cdi_arxiv_primary_2303_05378
source arXiv.org
subjects Computer Science - Learning
Computer Science - Software Engineering
title Greener yet Powerful: Taming Large Code Generation Models with Quantization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T12%3A19%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Greener%20yet%20Powerful:%20Taming%20Large%20Code%20Generation%20Models%20with%20Quantization&rft.au=Wei,%20Xiaokai&rft.date=2023-03-09&rft_id=info:doi/10.48550/arxiv.2303.05378&rft_dat=%3Carxiv_GOX%3E2303_05378%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true