SUPERMERGE: An Approach For Gradient-Based Model Merging
Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks. However, high-throughput applications often prefer smaller task-specific models because of their lower latency and cost. One challenge of using...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic,
monolithic, and possess the superpower to simultaneously support thousands of
tasks. However, high-throughput applications often prefer smaller task-specific
models because of their lower latency and cost. One challenge of using
task-specific models is the incremental need for solving newer tasks after the
model is already deployed for existing tasks. A straightforward solution
requires fine-tuning the model again for both existing and new tasks, which is
computationally expensive and time-consuming. To address this issue, we propose
a model merging based approach called SUPERMERGE. SUPERMERGE is a
gradient-based method to systematically merge several fine-tuned models trained
on existing and new tasks. SUPERMERGE is designed to be lightweight and fast,
and the merged model achieves similar performance to fully fine-tuned models on
all tasks. Furthermore, we proposed a hierarchical model merging strategy to
reduce the peak space requirement without sacrificing the performance of the
merged model. We experimentally demonstrate that SUPERMERGE outperforms
existing model merging methods on common natural language processing and
computer vision tasks. |
---|---|
DOI: | 10.48550/arxiv.2412.10416 |