Multi-Task Model Merging via Adaptive Weight Disentanglement
Model merging has recently gained attention as an economical and scalable approach to incorporate task-specific weights from various tasks into a unified multi-task model. For example, in Task Arithmetic (TA), adding the fine-tuned weights of different tasks can enhance the model's performance...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Model merging has recently gained attention as an economical and scalable
approach to incorporate task-specific weights from various tasks into a unified
multi-task model. For example, in Task Arithmetic (TA), adding the fine-tuned
weights of different tasks can enhance the model's performance on those tasks,
while subtracting them leads to task forgetting. Although TA is highly
effective, interference among task still hampers the performance of the merged
model. Existing methods for handling conflicts between task generally rely on
empirical selection, resulting in suboptimal performance. In this paper, we
introduce an Adaptive Weight Disentanglement method. We begin by theoretically
proving that task vectors employed in model merging should be orthogonal to
minimize interference among tasks. Guided by this insight, we initialize
redundant vectors such that, when subtracted from the original task vectors,
the resulting vectors exhibit increased orthogonality. Additionally, we impose
an norm constraint on the redundant vectors to preserve the performance of the
task-specific models. Experimental results demonstrate the effectiveness of our
proposed technique: it successfully extracts redundant vectors, and after their
subtraction, the task vectors not only retain robust performance but also
achieve superior fusion outcomes. Our code is available at
\href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}. |
---|---|
DOI: | 10.48550/arxiv.2411.18729 |