On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis
In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information forensics and security 2024-01, Vol.19, p.1-1 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother. |
---|---|
ISSN: | 1556-6013 1556-6021 |
DOI: | 10.1109/TIFS.2023.3326667 |