1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we int...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and
BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs
in terms of speed and energy consumption. These developments also enable local
LLM deployment across a broad range of devices. In this work, we introduce
bitnet.cpp, a tailored software stack designed to unlock the full potential of
1-bit LLMs. Specifically, we develop a set of kernels to support fast and
lossless inference of ternary BitNet b1.58 LLMs on CPUs. Extensive experiments
demonstrate that bitnet.cpp achieves significant speedups, ranging from 2.37x
to 6.17x on x86 CPUs and from 1.37x to 5.07x on ARM CPUs, across various model
sizes. The code is available at https://github.com/microsoft/BitNet. |
---|---|
DOI: | 10.48550/arxiv.2410.16144 |