GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. Th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gholami, Mohsen, Akbari, Mohammad, Hu, Cindy, Masrani, Vaden, Wang, Z. Jane, Zhang, Yong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gholami, Mohsen Akbari, Mohammad Hu, Cindy Masrani, Vaden Wang, Z. Jane Zhang, Yong
description	Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.
doi_str_mv	10.48550/arxiv.2403.19754
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_19754</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_19754</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-9820f38b8eb24a4ece261611e7a0cffad4cab3fb03a77130059d44628065f46c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_gIMdP9MdaiEgImXTfbiO7cqSm6A0KY-vB5euRprRGekgdMdoIYyU9AGmr3gqSkF5wSotxTV6r9tmu8a1H_wEKf54h9-G8TN5t_d4G49zTAnmOA74FAG3y0zGQHI_RbvkntRLdH9UA8N-gQzBDJe_vN-gqwDp6G8vuUK756fd5oU0bf26eWwIKC1IZUoauLHG21KA8L0vFVOMeQ20DwGc6MHyYCkHrRmnVFZOCFUaqmQQqucrdP9_e1bsPqZ4gOm7y6rdWZX_ApO_Twc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><source>arXiv.org</source><creator>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</creator><creatorcontrib>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</creatorcontrib><description>Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.</description><identifier>DOI: 10.48550/arxiv.2403.19754</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.19754$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.19754$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gholami, Mohsen</creatorcontrib><creatorcontrib>Akbari, Mohammad</creatorcontrib><creatorcontrib>Hu, Cindy</creatorcontrib><creatorcontrib>Masrani, Vaden</creatorcontrib><creatorcontrib>Wang, Z. Jane</creatorcontrib><creatorcontrib>Zhang, Yong</creatorcontrib><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><description>Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_gIMdP9MdaiEgImXTfbiO7cqSm6A0KY-vB5euRprRGekgdMdoIYyU9AGmr3gqSkF5wSotxTV6r9tmu8a1H_wEKf54h9-G8TN5t_d4G49zTAnmOA74FAG3y0zGQHI_RbvkntRLdH9UA8N-gQzBDJe_vN-gqwDp6G8vuUK756fd5oU0bf26eWwIKC1IZUoauLHG21KA8L0vFVOMeQ20DwGc6MHyYCkHrRmnVFZOCFUaqmQQqucrdP9_e1bsPqZ4gOm7y6rdWZX_ApO_Twc</recordid><startdate>20240328</startdate><enddate>20240328</enddate><creator>Gholami, Mohsen</creator><creator>Akbari, Mohammad</creator><creator>Hu, Cindy</creator><creator>Masrani, Vaden</creator><creator>Wang, Z. Jane</creator><creator>Zhang, Yong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240328</creationdate><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><author>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-9820f38b8eb24a4ece261611e7a0cffad4cab3fb03a77130059d44628065f46c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Gholami, Mohsen</creatorcontrib><creatorcontrib>Akbari, Mohammad</creatorcontrib><creatorcontrib>Hu, Cindy</creatorcontrib><creatorcontrib>Masrani, Vaden</creatorcontrib><creatorcontrib>Wang, Z. Jane</creatorcontrib><creatorcontrib>Zhang, Yong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gholami, Mohsen</au><au>Akbari, Mohammad</au><au>Hu, Cindy</au><au>Masrani, Vaden</au><au>Wang, Z. Jane</au><au>Zhang, Yong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</atitle><date>2024-03-28</date><risdate>2024</risdate><abstract>Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.</abstract><doi>10.48550/arxiv.2403.19754</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2403.19754
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2403_19754
source	arXiv.org
subjects	Computer Science - Computation and Language
title	GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T15%3A00%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GOLD:%20Generalized%20Knowledge%20Distillation%20via%20Out-of-Distribution-Guided%20Language%20Data%20Generation&rft.au=Gholami,%20Mohsen&rft.date=2024-03-28&rft_id=info:doi/10.48550/arxiv.2403.19754&rft_dat=%3Carxiv_GOX%3E2403_19754%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true