Task-oriented Memory-efficient Pruning-Adapter
The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Outstanding performance and growing size of Large Language Models has led
to increased attention in parameter efficient learning. The two predominant
approaches are Adapters and Pruning. Adapters are to freeze the model and give
it a new weight matrix on the side, which can significantly reduce the time and
memory of training, but the cost is that the evaluation and testing will
increase the time and memory consumption. Pruning is to cut off some weight and
re-distribute the remaining weight, which sacrifices the complexity of training
at the cost of extremely high memory and training time, making the cost of
evaluation and testing relatively low. So efficiency of training and inference
can't be obtained in the same time. In this work, we propose a task-oriented
Pruning-Adapter method that achieve a high memory efficiency of training and
memory, and speeds up training time and ensures no significant decrease in
accuracy in GLUE tasks, achieving training and inference efficiency at the same
time. |
---|---|
DOI: | 10.48550/arxiv.2303.14704 |