HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation
Fine-tuning pre-trained language models for downstream tasks has achieved impressive results in NLP. However, fine-tuning all parameters becomes impractical due to the rapidly increasing size of model parameters. To address this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Fine-tuning pre-trained language models for downstream tasks has achieved
impressive results in NLP. However, fine-tuning all parameters becomes
impractical due to the rapidly increasing size of model parameters. To address
this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of
parameters. Most PEFT methods, such as LoRA, use incremental updates, which
involve adding learned weight matrix increments to the original parameters.
Although effective, these methods face limitations in capturing complex
parameter dynamics and do not maintain a strong correlation between the
original and updated parameters. To overcome these challenges, we propose the
direct Updated Transformation (UT) paradigm, which constructs a transformation
directly from the original to the updated parameters. This approach ensures
that the correlation between the original and updated parameters is preserved,
leveraging the semantic features learned during pre-training. Building on this
paradigm, we present the Hadamard Updated Transformation (HUT) method. HUT
efficiently updates the original weight matrix using the Hadamard
transformation with two low-rank matrices, offering a more expressive and
flexible update mechanism. This allows HUT to capture richer parameter features
through functional transformations, reducing computational complexity while
maintaining or improving model quality. Theoretical analysis and extensive
experiments on RoBERTa and GPT-2 validate the effectiveness of HUT. Results
show that HUT performs on par with or better than other PEFT methods in terms
of model quality, while significantly reducing computational complexity. |
---|---|
DOI: | 10.48550/arxiv.2409.13501 |