VcLLM: Video Codecs are Secretly Tensor Codecs
As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As the parameter size of large language models (LLMs) continues to expand,
the need for a large memory footprint and high communication bandwidth have
become significant bottlenecks for the training and inference of LLMs. To
mitigate these bottlenecks, various tensor compression techniques have been
proposed to reduce the data size, thereby alleviating memory requirements and
communication pressure.
Our research found that video codecs, despite being originally designed for
compressing videos, show excellent efficiency when compressing various types of
tensors. We demonstrate that video codecs can be versatile and general-purpose
tensor codecs while achieving the state-of-the-art compression efficiency in
various tasks. We further make use of the hardware video encoding and decoding
module available on GPUs to create a framework capable of both inference and
training with video codecs repurposed as tensor codecs. This greatly reduces
the requirement for memory capacity and communication bandwidth, enabling
training and inference of large models on consumer-grade GPUs. |
---|---|
DOI: | 10.48550/arxiv.2407.00467 |