Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's ca...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Alexandrov, Anton Raychev, Veselin Müller, Mark Niklas Zhang, Ce Vechev, Martin Toutanova, Kristina |
description | As open-weight large language models (LLMs) achieve ever more impressive
performances across a wide range of tasks in English, practitioners aim to
adapt these models to different languages. However, such language adaptation is
often accompanied by catastrophic forgetting of the base model's capabilities,
severely limiting the usefulness of the resulting model. We address this issue
by proposing Branch-and-Merge (BaM), a new adaptation method based on
iteratively merging multiple models, fine-tuned on a subset of the available
training data. BaM is based on the insight that this yields lower magnitude but
higher quality weight changes, reducing forgetting of the source domain while
maintaining learning on the target domain. We demonstrate in an extensive
empirical study on Bulgarian and German that BaM can significantly reduce
forgetting while matching or even improving target domain performance compared
to both standard continued pretraining and instruction finetuning across
different model architectures. |
doi_str_mv | 10.48550/arxiv.2407.08699 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_08699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_08699</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_086993</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zOwMLO05GTw9M0syUxPLMnMS1dwTixJLC4pyi_IyExWcMsvSk8tAYtn5in4JOallyampyqEFCXmFaelFimUZSYq-OanpOYo-KYWpQOV8TCwpiXmFKfyQmluBnk31xBnD12wpfEFRZm5iUWV8SDL48GWGxNWAQBS8TpJ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mitigating Catastrophic Forgetting in Language Transfer via Model Merging</title><source>arXiv.org</source><creator>Alexandrov, Anton ; Raychev, Veselin ; Müller, Mark Niklas ; Zhang, Ce ; Vechev, Martin ; Toutanova, Kristina</creator><creatorcontrib>Alexandrov, Anton ; Raychev, Veselin ; Müller, Mark Niklas ; Zhang, Ce ; Vechev, Martin ; Toutanova, Kristina</creatorcontrib><description>As open-weight large language models (LLMs) achieve ever more impressive
performances across a wide range of tasks in English, practitioners aim to
adapt these models to different languages. However, such language adaptation is
often accompanied by catastrophic forgetting of the base model's capabilities,
severely limiting the usefulness of the resulting model. We address this issue
by proposing Branch-and-Merge (BaM), a new adaptation method based on
iteratively merging multiple models, fine-tuned on a subset of the available
training data. BaM is based on the insight that this yields lower magnitude but
higher quality weight changes, reducing forgetting of the source domain while
maintaining learning on the target domain. We demonstrate in an extensive
empirical study on Bulgarian and German that BaM can significantly reduce
forgetting while matching or even improving target domain performance compared
to both standard continued pretraining and instruction finetuning across
different model architectures.</description><identifier>DOI: 10.48550/arxiv.2407.08699</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-07</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.08699$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08699$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Alexandrov, Anton</creatorcontrib><creatorcontrib>Raychev, Veselin</creatorcontrib><creatorcontrib>Müller, Mark Niklas</creatorcontrib><creatorcontrib>Zhang, Ce</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><creatorcontrib>Toutanova, Kristina</creatorcontrib><title>Mitigating Catastrophic Forgetting in Language Transfer via Model Merging</title><description>As open-weight large language models (LLMs) achieve ever more impressive
performances across a wide range of tasks in English, practitioners aim to
adapt these models to different languages. However, such language adaptation is
often accompanied by catastrophic forgetting of the base model's capabilities,
severely limiting the usefulness of the resulting model. We address this issue
by proposing Branch-and-Merge (BaM), a new adaptation method based on
iteratively merging multiple models, fine-tuned on a subset of the available
training data. BaM is based on the insight that this yields lower magnitude but
higher quality weight changes, reducing forgetting of the source domain while
maintaining learning on the target domain. We demonstrate in an extensive
empirical study on Bulgarian and German that BaM can significantly reduce
forgetting while matching or even improving target domain performance compared
to both standard continued pretraining and instruction finetuning across
different model architectures.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zOwMLO05GTw9M0syUxPLMnMS1dwTixJLC4pyi_IyExWcMsvSk8tAYtn5in4JOallyampyqEFCXmFaelFimUZSYq-OanpOYo-KYWpQOV8TCwpiXmFKfyQmluBnk31xBnD12wpfEFRZm5iUWV8SDL48GWGxNWAQBS8TpJ</recordid><startdate>20240711</startdate><enddate>20240711</enddate><creator>Alexandrov, Anton</creator><creator>Raychev, Veselin</creator><creator>Müller, Mark Niklas</creator><creator>Zhang, Ce</creator><creator>Vechev, Martin</creator><creator>Toutanova, Kristina</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240711</creationdate><title>Mitigating Catastrophic Forgetting in Language Transfer via Model Merging</title><author>Alexandrov, Anton ; Raychev, Veselin ; Müller, Mark Niklas ; Zhang, Ce ; Vechev, Martin ; Toutanova, Kristina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_086993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Alexandrov, Anton</creatorcontrib><creatorcontrib>Raychev, Veselin</creatorcontrib><creatorcontrib>Müller, Mark Niklas</creatorcontrib><creatorcontrib>Zhang, Ce</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><creatorcontrib>Toutanova, Kristina</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Alexandrov, Anton</au><au>Raychev, Veselin</au><au>Müller, Mark Niklas</au><au>Zhang, Ce</au><au>Vechev, Martin</au><au>Toutanova, Kristina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating Catastrophic Forgetting in Language Transfer via Model Merging</atitle><date>2024-07-11</date><risdate>2024</risdate><abstract>As open-weight large language models (LLMs) achieve ever more impressive
performances across a wide range of tasks in English, practitioners aim to
adapt these models to different languages. However, such language adaptation is
often accompanied by catastrophic forgetting of the base model's capabilities,
severely limiting the usefulness of the resulting model. We address this issue
by proposing Branch-and-Merge (BaM), a new adaptation method based on
iteratively merging multiple models, fine-tuned on a subset of the available
training data. BaM is based on the insight that this yields lower magnitude but
higher quality weight changes, reducing forgetting of the source domain while
maintaining learning on the target domain. We demonstrate in an extensive
empirical study on Bulgarian and German that BaM can significantly reduce
forgetting while matching or even improving target domain performance compared
to both standard continued pretraining and instruction finetuning across
different model architectures.</abstract><doi>10.48550/arxiv.2407.08699</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2407.08699 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2407_08699 |
source | arXiv.org |
subjects | Computer Science - Learning |
title | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T01%3A04%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20Catastrophic%20Forgetting%20in%20Language%20Transfer%20via%20Model%20Merging&rft.au=Alexandrov,%20Anton&rft.date=2024-07-11&rft_id=info:doi/10.48550/arxiv.2407.08699&rft_dat=%3Carxiv_GOX%3E2407_08699%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |