Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of trainin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-10
Hauptverfasser:	Guo, Song, Xu, Jiahang, Li Lyna Zhang, Mao, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Collaboration Cooperation Data collection Large language models Reading comprehension Regularization Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Guo, Song Xu, Jiahang Li Lyna Zhang Mao, Yang
description	Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2876196930</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2876196930</sourcerecordid><originalsourceid>FETCH-proquest_journals_28761969303</originalsourceid><addsrcrecordid>eNqNjE0KwjAUhIMgWLR3CLgutIn9c1sUFxUE3ZfYxtoSk_peotc3ggdwMzPwzcyMBIzzJCo2jC1IiDjGccyynKUpD0hTmccEEtFs6dmCa60D2dETOD3onr4He6eVUUpcDQg7vKRHfmG_sJYCNNLvg2gtrQX00qvunfDhaDqpcEXmN6FQhj9fkvV-d6kO0QTm6STaZjQOtEcNK_IsKbOSx_y_1gfvTkT5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2876196930</pqid></control><display><type>article</type><title>Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models</title><source>Free E- Journals</source><creator>Guo, Song ; Xu, Jiahang ; Li Lyna Zhang ; Mao, Yang</creator><creatorcontrib>Guo, Song ; Xu, Jiahang ; Li Lyna Zhang ; Mao, Yang</creatorcontrib><description>Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Collaboration ; Cooperation ; Data collection ; Large language models ; Reading comprehension ; Regularization ; Training</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Xu, Jiahang</creatorcontrib><creatorcontrib>Li Lyna Zhang</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><title>Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models</title><title>arXiv.org</title><description>Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.</description><subject>Algorithms</subject><subject>Collaboration</subject><subject>Cooperation</subject><subject>Data collection</subject><subject>Large language models</subject><subject>Reading comprehension</subject><subject>Regularization</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjE0KwjAUhIMgWLR3CLgutIn9c1sUFxUE3ZfYxtoSk_peotc3ggdwMzPwzcyMBIzzJCo2jC1IiDjGccyynKUpD0hTmccEEtFs6dmCa60D2dETOD3onr4He6eVUUpcDQg7vKRHfmG_sJYCNNLvg2gtrQX00qvunfDhaDqpcEXmN6FQhj9fkvV-d6kO0QTm6STaZjQOtEcNK_IsKbOSx_y_1gfvTkT5</recordid><startdate>20231011</startdate><enddate>20231011</enddate><creator>Guo, Song</creator><creator>Xu, Jiahang</creator><creator>Li Lyna Zhang</creator><creator>Mao, Yang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231011</creationdate><title>Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models</title><author>Guo, Song ; Xu, Jiahang ; Li Lyna Zhang ; Mao, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28761969303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Collaboration</topic><topic>Cooperation</topic><topic>Data collection</topic><topic>Large language models</topic><topic>Reading comprehension</topic><topic>Regularization</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Xu, Jiahang</creatorcontrib><creatorcontrib>Li Lyna Zhang</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Song</au><au>Xu, Jiahang</au><au>Li Lyna Zhang</au><au>Mao, Yang</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models</atitle><jtitle>arXiv.org</jtitle><date>2023-10-11</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2876196930
source	Free E- Journals
subjects	Algorithms Collaboration Cooperation Data collection Large language models Reading comprehension Regularization Training
title	Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T13%3A36%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Compresso:%20Structured%20Pruning%20with%20Collaborative%20Prompting%20Learns%20Compact%20Large%20Language%20Models&rft.jtitle=arXiv.org&rft.au=Guo,%20Song&rft.date=2023-10-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2876196930%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2876196930&rft_id=info:pmid/&rfr_iscdi=true