BinaryBERT: Pushing the Limit of BERT Quantization
The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization. We find that a binar...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The rapid development of large pre-trained language models has greatly
increased the demand for model compression techniques, among which quantization
is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT
quantization to the limit by weight binarization. We find that a binary BERT is
hard to be trained directly than a ternary counterpart due to its complex and
irregular loss landscape. Therefore, we propose ternary weight splitting, which
initializes BinaryBERT by equivalently splitting from a half-sized ternary
network. The binary model thus inherits the good performance of the ternary
one, and can be further enhanced by fine-tuning the new architecture after
splitting. Empirical results show that our BinaryBERT has only a slight
performance drop compared with the full-precision model while being 24x
smaller, achieving the state-of-the-art compression results on the GLUE and
SQuAD benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2012.15701 |