Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wei, Xiaokai, Gonugondla, Sujan, Ahmad, Wasi, Wang, Shiqi, Ray, Baishakhi, Qian, Haifeng, Li, Xiaopeng, Kumar, Varun, Wang, Zijian, Tian, Yuchen, Sun, Qing, Athiwaratkun, Ben, Shang, Mingyue, Ramanathan, Murali Krishna, Bhatia, Parminder, Xiang, Bing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Software Engineering
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wei, Xiaokai Gonugondla, Sujan Ahmad, Wasi Wang, Shiqi Ray, Baishakhi Qian, Haifeng Li, Xiaopeng Kumar, Varun Wang, Zijian Tian, Yuchen Sun, Qing Athiwaratkun, Ben Shang, Mingyue Ramanathan, Murali Krishna Bhatia, Parminder Xiang, Bing
description	ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.
doi_str_mv	10.48550/arxiv.2303.05378
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2303_05378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2303_05378</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-2d41d1fe7b4aedc499d1566ed79094fc48d4133c2d4d858bfe0ef0b66b92b9a53</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwKr-gQQnfsRmhyIIiCColH10HV8XS2mC3JRSvp40sBqN5mikQ8hNxlKhpWS3EL_DV5pzxlMmeaEvyUsVEQeM9IQTfR-PGP2hv6MN7MKwpTXELdJydEirMwVTGAf6Ovd-T49h-qCbAwxT-FmGK3Lhod_j9X-uSPP40JRPSf1WPZf3dQKq0EnuROYyj4UVgK4TxrhMKoWuMMwI3wk9A5x3M-e01NYjQ8-sUtbk1oDkK7L-u11s2s8YdhBP7dmqXaz4L5xgSDg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><source>arXiv.org</source><creator>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</creator><creatorcontrib>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</creatorcontrib><description>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</description><identifier>DOI: 10.48550/arxiv.2303.05378</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Software Engineering</subject><creationdate>2023-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2303.05378$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2303.05378$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wei, Xiaokai</creatorcontrib><creatorcontrib>Gonugondla, Sujan</creatorcontrib><creatorcontrib>Ahmad, Wasi</creatorcontrib><creatorcontrib>Wang, Shiqi</creatorcontrib><creatorcontrib>Ray, Baishakhi</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><creatorcontrib>Li, Xiaopeng</creatorcontrib><creatorcontrib>Kumar, Varun</creatorcontrib><creatorcontrib>Wang, Zijian</creatorcontrib><creatorcontrib>Tian, Yuchen</creatorcontrib><creatorcontrib>Sun, Qing</creatorcontrib><creatorcontrib>Athiwaratkun, Ben</creatorcontrib><creatorcontrib>Shang, Mingyue</creatorcontrib><creatorcontrib>Ramanathan, Murali Krishna</creatorcontrib><creatorcontrib>Bhatia, Parminder</creatorcontrib><creatorcontrib>Xiang, Bing</creatorcontrib><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><description>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwKr-gQQnfsRmhyIIiCColH10HV8XS2mC3JRSvp40sBqN5mikQ8hNxlKhpWS3EL_DV5pzxlMmeaEvyUsVEQeM9IQTfR-PGP2hv6MN7MKwpTXELdJydEirMwVTGAf6Ovd-T49h-qCbAwxT-FmGK3Lhod_j9X-uSPP40JRPSf1WPZf3dQKq0EnuROYyj4UVgK4TxrhMKoWuMMwI3wk9A5x3M-e01NYjQ8-sUtbk1oDkK7L-u11s2s8YdhBP7dmqXaz4L5xgSDg</recordid><startdate>20230309</startdate><enddate>20230309</enddate><creator>Wei, Xiaokai</creator><creator>Gonugondla, Sujan</creator><creator>Ahmad, Wasi</creator><creator>Wang, Shiqi</creator><creator>Ray, Baishakhi</creator><creator>Qian, Haifeng</creator><creator>Li, Xiaopeng</creator><creator>Kumar, Varun</creator><creator>Wang, Zijian</creator><creator>Tian, Yuchen</creator><creator>Sun, Qing</creator><creator>Athiwaratkun, Ben</creator><creator>Shang, Mingyue</creator><creator>Ramanathan, Murali Krishna</creator><creator>Bhatia, Parminder</creator><creator>Xiang, Bing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230309</creationdate><title>Greener yet Powerful: Taming Large Code Generation Models with Quantization</title><author>Wei, Xiaokai ; Gonugondla, Sujan ; Ahmad, Wasi ; Wang, Shiqi ; Ray, Baishakhi ; Qian, Haifeng ; Li, Xiaopeng ; Kumar, Varun ; Wang, Zijian ; Tian, Yuchen ; Sun, Qing ; Athiwaratkun, Ben ; Shang, Mingyue ; Ramanathan, Murali Krishna ; Bhatia, Parminder ; Xiang, Bing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-2d41d1fe7b4aedc499d1566ed79094fc48d4133c2d4d858bfe0ef0b66b92b9a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Xiaokai</creatorcontrib><creatorcontrib>Gonugondla, Sujan</creatorcontrib><creatorcontrib>Ahmad, Wasi</creatorcontrib><creatorcontrib>Wang, Shiqi</creatorcontrib><creatorcontrib>Ray, Baishakhi</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><creatorcontrib>Li, Xiaopeng</creatorcontrib><creatorcontrib>Kumar, Varun</creatorcontrib><creatorcontrib>Wang, Zijian</creatorcontrib><creatorcontrib>Tian, Yuchen</creatorcontrib><creatorcontrib>Sun, Qing</creatorcontrib><creatorcontrib>Athiwaratkun, Ben</creatorcontrib><creatorcontrib>Shang, Mingyue</creatorcontrib><creatorcontrib>Ramanathan, Murali Krishna</creatorcontrib><creatorcontrib>Bhatia, Parminder</creatorcontrib><creatorcontrib>Xiang, Bing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Xiaokai</au><au>Gonugondla, Sujan</au><au>Ahmad, Wasi</au><au>Wang, Shiqi</au><au>Ray, Baishakhi</au><au>Qian, Haifeng</au><au>Li, Xiaopeng</au><au>Kumar, Varun</au><au>Wang, Zijian</au><au>Tian, Yuchen</au><au>Sun, Qing</au><au>Athiwaratkun, Ben</au><au>Shang, Mingyue</au><au>Ramanathan, Murali Krishna</au><au>Bhatia, Parminder</au><au>Xiang, Bing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Greener yet Powerful: Taming Large Code Generation Models with Quantization</atitle><date>2023-03-09</date><risdate>2023</risdate><abstract>ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.</abstract><doi>10.48550/arxiv.2303.05378</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2303.05378
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2303_05378
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Software Engineering
title	Greener yet Powerful: Taming Large Code Generation Models with Quantization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T12%3A19%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Greener%20yet%20Powerful:%20Taming%20Large%20Code%20Generation%20Models%20with%20Quantization&rft.au=Wei,%20Xiaokai&rft.date=2023-03-09&rft_id=info:doi/10.48550/arxiv.2303.05378&rft_dat=%3Carxiv_GOX%3E2303_05378%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true