CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models
Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences. Due to their massive size and the high costs associated with training, LLMs are not suitable for frequent retraining. However, updates are necessary to keep them in sync with rapidly evolving h...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large Language Models (LLMs) need to adapt to the continuous changes in data,
tasks, and user preferences. Due to their massive size and the high costs
associated with training, LLMs are not suitable for frequent retraining.
However, updates are necessary to keep them in sync with rapidly evolving human
knowledge. To address these challenges, this paper proposes the Compression
Memory Training (CMT) method, an efficient and effective online adaptation
framework for LLMs that features robust knowledge retention capabilities.
Inspired by human memory mechanisms, CMT compresses and extracts information
from new documents to be stored in a memory bank. When answering to queries
related to these new documents, the model aggregates these document memories
from the memory bank to better answer user questions. The parameters of the LLM
itself do not change during training and inference, reducing the risk of
catastrophic forgetting. To enhance the encoding, retrieval, and aggregation of
memory, we further propose three new general and flexible techniques, including
memory-aware objective, self-matching and top-aggregation. Extensive
experiments conducted on three continual learning datasets (i.e., StreamingQA,
SQuAD and ArchivalQA) demonstrate that the proposed method improves model
adaptability and robustness across multiple base LLMs (e.g., +4.07 EM & +4.19
F1 in StreamingQA with Llama-2-7b). |
---|---|
DOI: | 10.48550/arxiv.2412.07393 |