Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind

Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zeng, Hongchuan, Xu, Hongshen, Chen, Lu, Yu, Kai
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zeng, Hongchuan Xu, Hongshen Chen, Lu Yu, Kai
description	Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques.
doi_str_mv	10.48550/arxiv.2404.04748
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_04748</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_04748</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-f64a6d031295e4fb0aad723a2377e4e61c56ba64c9b261b70a943b772a0a13cc3</originalsourceid><addsrcrecordid>eNpFj8tOwzAURL1hgQofwAr_QIJfsRN2NOIlpbBo142u45tiyXUqh1Tw96QFic3MSDMa6RByw1muyqJgd5C-_DEXiqmcKaPKS7JdTeHTBx93EwS6TOAjXU9ph0O8pw3MYdZTOYfV4DCMtIZILdJ62B8SjiM62iAc5wf6NvyPl_jho7siFz2EEa__fEE2T4-b-iVr3p9f64cmA23KrNcKtGOSi6pA1VsG4IyQIKQxqFDzrtAWtOoqKzS3hkGlpDVGAAMuu04uyO3v7RmwPSS_h_TdnkDbM6j8AYJQTaA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind</title><source>arXiv.org</source><creator>Zeng, Hongchuan ; Xu, Hongshen ; Chen, Lu ; Yu, Kai</creator><creatorcontrib>Zeng, Hongchuan ; Xu, Hongshen ; Chen, Lu ; Yu, Kai</creatorcontrib><description>Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques.</description><identifier>DOI: 10.48550/arxiv.2404.04748</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.04748$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.04748$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zeng, Hongchuan</creatorcontrib><creatorcontrib>Xu, Hongshen</creatorcontrib><creatorcontrib>Chen, Lu</creatorcontrib><creatorcontrib>Yu, Kai</creatorcontrib><title>Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind</title><description>Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj8tOwzAURL1hgQofwAr_QIJfsRN2NOIlpbBo142u45tiyXUqh1Tw96QFic3MSDMa6RByw1muyqJgd5C-_DEXiqmcKaPKS7JdTeHTBx93EwS6TOAjXU9ph0O8pw3MYdZTOYfV4DCMtIZILdJ62B8SjiM62iAc5wf6NvyPl_jho7siFz2EEa__fEE2T4-b-iVr3p9f64cmA23KrNcKtGOSi6pA1VsG4IyQIKQxqFDzrtAWtOoqKzS3hkGlpDVGAAMuu04uyO3v7RmwPSS_h_TdnkDbM6j8AYJQTaA</recordid><startdate>20240406</startdate><enddate>20240406</enddate><creator>Zeng, Hongchuan</creator><creator>Xu, Hongshen</creator><creator>Chen, Lu</creator><creator>Yu, Kai</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240406</creationdate><title>Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind</title><author>Zeng, Hongchuan ; Xu, Hongshen ; Chen, Lu ; Yu, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-f64a6d031295e4fb0aad723a2377e4e61c56ba64c9b261b70a943b772a0a13cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeng, Hongchuan</creatorcontrib><creatorcontrib>Xu, Hongshen</creatorcontrib><creatorcontrib>Chen, Lu</creatorcontrib><creatorcontrib>Yu, Kai</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeng, Hongchuan</au><au>Xu, Hongshen</au><au>Chen, Lu</au><au>Yu, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind</atitle><date>2024-04-06</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques.</abstract><doi>10.48550/arxiv.2404.04748</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.04748
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_04748
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T00%3A11%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multilingual%20Brain%20Surgeon:%20Large%20Language%20Models%20Can%20be%20Compressed%20Leaving%20No%20Language%20Behind&rft.au=Zeng,%20Hongchuan&rft.date=2024-04-06&rft_id=info:doi/10.48550/arxiv.2404.04748&rft_dat=%3Carxiv_GOX%3E2404_04748%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true