GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. Th...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Gholami, Mohsen Akbari, Mohammad Hu, Cindy Masrani, Vaden Wang, Z. Jane Zhang, Yong |
description | Knowledge distillation from LLMs is essential for the efficient deployment of
language models. Prior works have proposed data generation using LLMs for
preparing distilled models. We argue that generating data with LLMs is prone to
sampling mainly from the center of original content distribution. This
limitation hinders the distilled model from learning the true underlying data
distribution and to forget the tails of the distributions (samples with lower
probability). To this end, we propose GOLD, a task-agnostic data generation and
knowledge distillation framework, which employs an iterative
out-of-distribution-guided feedback mechanism for the LLM. As a result, the
generated data improves the generalizability of distilled models. An
energy-based OOD evaluation approach is also introduced to deal with noisy
generated data. Our extensive experiments on 10 different classification and
sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior
arts and the LLM with an average improvement of 5% and 14%. We will also show
that the proposed method is applicable to less explored and novel tasks. The
code is available. |
doi_str_mv | 10.48550/arxiv.2403.19754 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_19754</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_19754</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-9820f38b8eb24a4ece261611e7a0cffad4cab3fb03a77130059d44628065f46c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_gIMdP9MdaiEgImXTfbiO7cqSm6A0KY-vB5euRprRGekgdMdoIYyU9AGmr3gqSkF5wSotxTV6r9tmu8a1H_wEKf54h9-G8TN5t_d4G49zTAnmOA74FAG3y0zGQHI_RbvkntRLdH9UA8N-gQzBDJe_vN-gqwDp6G8vuUK756fd5oU0bf26eWwIKC1IZUoauLHG21KA8L0vFVOMeQ20DwGc6MHyYCkHrRmnVFZOCFUaqmQQqucrdP9_e1bsPqZ4gOm7y6rdWZX_ApO_Twc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><source>arXiv.org</source><creator>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</creator><creatorcontrib>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</creatorcontrib><description>Knowledge distillation from LLMs is essential for the efficient deployment of
language models. Prior works have proposed data generation using LLMs for
preparing distilled models. We argue that generating data with LLMs is prone to
sampling mainly from the center of original content distribution. This
limitation hinders the distilled model from learning the true underlying data
distribution and to forget the tails of the distributions (samples with lower
probability). To this end, we propose GOLD, a task-agnostic data generation and
knowledge distillation framework, which employs an iterative
out-of-distribution-guided feedback mechanism for the LLM. As a result, the
generated data improves the generalizability of distilled models. An
energy-based OOD evaluation approach is also introduced to deal with noisy
generated data. Our extensive experiments on 10 different classification and
sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior
arts and the LLM with an average improvement of 5% and 14%. We will also show
that the proposed method is applicable to less explored and novel tasks. The
code is available.</description><identifier>DOI: 10.48550/arxiv.2403.19754</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.19754$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.19754$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gholami, Mohsen</creatorcontrib><creatorcontrib>Akbari, Mohammad</creatorcontrib><creatorcontrib>Hu, Cindy</creatorcontrib><creatorcontrib>Masrani, Vaden</creatorcontrib><creatorcontrib>Wang, Z. Jane</creatorcontrib><creatorcontrib>Zhang, Yong</creatorcontrib><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><description>Knowledge distillation from LLMs is essential for the efficient deployment of
language models. Prior works have proposed data generation using LLMs for
preparing distilled models. We argue that generating data with LLMs is prone to
sampling mainly from the center of original content distribution. This
limitation hinders the distilled model from learning the true underlying data
distribution and to forget the tails of the distributions (samples with lower
probability). To this end, we propose GOLD, a task-agnostic data generation and
knowledge distillation framework, which employs an iterative
out-of-distribution-guided feedback mechanism for the LLM. As a result, the
generated data improves the generalizability of distilled models. An
energy-based OOD evaluation approach is also introduced to deal with noisy
generated data. Our extensive experiments on 10 different classification and
sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior
arts and the LLM with an average improvement of 5% and 14%. We will also show
that the proposed method is applicable to less explored and novel tasks. The
code is available.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_gIMdP9MdaiEgImXTfbiO7cqSm6A0KY-vB5euRprRGekgdMdoIYyU9AGmr3gqSkF5wSotxTV6r9tmu8a1H_wEKf54h9-G8TN5t_d4G49zTAnmOA74FAG3y0zGQHI_RbvkntRLdH9UA8N-gQzBDJe_vN-gqwDp6G8vuUK756fd5oU0bf26eWwIKC1IZUoauLHG21KA8L0vFVOMeQ20DwGc6MHyYCkHrRmnVFZOCFUaqmQQqucrdP9_e1bsPqZ4gOm7y6rdWZX_ApO_Twc</recordid><startdate>20240328</startdate><enddate>20240328</enddate><creator>Gholami, Mohsen</creator><creator>Akbari, Mohammad</creator><creator>Hu, Cindy</creator><creator>Masrani, Vaden</creator><creator>Wang, Z. Jane</creator><creator>Zhang, Yong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240328</creationdate><title>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</title><author>Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-9820f38b8eb24a4ece261611e7a0cffad4cab3fb03a77130059d44628065f46c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Gholami, Mohsen</creatorcontrib><creatorcontrib>Akbari, Mohammad</creatorcontrib><creatorcontrib>Hu, Cindy</creatorcontrib><creatorcontrib>Masrani, Vaden</creatorcontrib><creatorcontrib>Wang, Z. Jane</creatorcontrib><creatorcontrib>Zhang, Yong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gholami, Mohsen</au><au>Akbari, Mohammad</au><au>Hu, Cindy</au><au>Masrani, Vaden</au><au>Wang, Z. Jane</au><au>Zhang, Yong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation</atitle><date>2024-03-28</date><risdate>2024</risdate><abstract>Knowledge distillation from LLMs is essential for the efficient deployment of
language models. Prior works have proposed data generation using LLMs for
preparing distilled models. We argue that generating data with LLMs is prone to
sampling mainly from the center of original content distribution. This
limitation hinders the distilled model from learning the true underlying data
distribution and to forget the tails of the distributions (samples with lower
probability). To this end, we propose GOLD, a task-agnostic data generation and
knowledge distillation framework, which employs an iterative
out-of-distribution-guided feedback mechanism for the LLM. As a result, the
generated data improves the generalizability of distilled models. An
energy-based OOD evaluation approach is also introduced to deal with noisy
generated data. Our extensive experiments on 10 different classification and
sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior
arts and the LLM with an average improvement of 5% and 14%. We will also show
that the proposed method is applicable to less explored and novel tasks. The
code is available.</abstract><doi>10.48550/arxiv.2403.19754</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2403.19754 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2403_19754 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T15%3A00%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GOLD:%20Generalized%20Knowledge%20Distillation%20via%20Out-of-Distribution-Guided%20Language%20Data%20Generation&rft.au=Gholami,%20Mohsen&rft.date=2024-03-28&rft_id=info:doi/10.48550/arxiv.2403.19754&rft_dat=%3Carxiv_GOX%3E2403_19754%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |