MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization

Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2025-02, Vol.158, p.111061, Article 111061
Hauptverfasser: Zhong, Yunshan, Zhou, Yuyao, Chao, Fei, Ji, Rongrong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. For completing the computation of a desired bit-width, MBQuant selects multiple branches, ensuring that the computational costs match those of the desired bit-width, to carry out forward propagation. By fixing the weight bit-width, MBQuant substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we observe that the first branch suffers from quantization errors caused by all bit-widths, leading to performance degradation. Thus, we introduce an amortization branch selection strategy that amortizes the errors. Specifically, the first branch is selected only for certain bit-widths, rather than universally, thereby the errors are distributed among the branches more evenly. Finally, we adopt an in-place distillation strategy that uses the largest bit-width to guide the other bit-widths to further enhance MBQuant’s performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is made publicly available at https://github.com/zysxmu/MBQuant. •We find that existing arbitrary bit-width methods suffer from accumulated quantization errors from switching weight and activations bit-widths.•We introduce MBQuant, which utilizes the multi-branch topology and an amortization strategy to address accumulated quantization errors.•Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width methods.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.111061