Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this e...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Spiking Neural Networks (SNNs), known for their biologically plausible
architecture, face the challenge of limited performance. The self-attention
mechanism, which is the cornerstone of the high-performance Transformer and
also a biologically inspired structure, is absent in existing SNNs. To this
end, we explore the potential of leveraging both self-attention capability and
biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA)
and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for
softmax and captures the sparse visual feature employing spike-based Query,
Key, and Value. This sparse computation without multiplication makes SSA
efficient and energy-saving. Further, we develop a Spiking Convolutional Stem
(SCS) with supplementary convolutional layers to enhance the architecture of
Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer
V2. To train larger and deeper Spikformer V2, we introduce a pioneering
exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we
pre-train Spikformer V2 with masking and reconstruction style inspired by the
mainstream self-supervised Transformer, and then finetune the Spikformer V2 on
the image classification on ImageNet. Extensive experiments show that
Spikformer V2 outperforms other previous surrogate training and ANN2SNN
methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38% using 4 time
steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of
81.10% with just 1 time step. To the best of our knowledge, this is the first
time that the SNN achieves 80+% accuracy on ImageNet. The code will be
available at Spikformer V2. |
---|---|
DOI: | 10.48550/arxiv.2401.02020 |