NegMerge: Consensual Weight Negation for Strong Machine Unlearning
Machine unlearning aims to selectively remove specific knowledge from a model. Current methods, such as task arithmetic, rely on fine-tuning models on the forget set, generating a task vector, and subtracting it from the original model. However, we argue the effectiveness of this approach is highly...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine unlearning aims to selectively remove specific knowledge from a
model. Current methods, such as task arithmetic, rely on fine-tuning models on
the forget set, generating a task vector, and subtracting it from the original
model. However, we argue the effectiveness of this approach is highly sensitive
to hyperparameter selection, necessitating careful validation to identify the
best model among many fine-tuned candidates. In this paper, we propose a novel
method that leverages all given fine-tuned models rather than selecting a
single one. By constructing task vectors from models trained with varied
hyperparameters and merging only the components of the task vectors with
consistent signs, we perform unlearning by negating the merged task vector from
the original model. Given that existing methods also utilize multiple
fine-tuned models, our approach delivers more effective unlearning without
incurring additional computational costs. We demonstrate the effectiveness of
our method on both vision-language models and standard image classification
models, showing improved unlearning performance with minimal degradation on the
retain set, outperforming state-of-the-art techniques. |
---|---|
DOI: | 10.48550/arxiv.2410.05583 |