High-performance area-efficient polynomial ring processor for CRYSTALS-Kyber on FPGAs
The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization p...
Gespeichert in:
Veröffentlicht in: | Integration (Amsterdam) 2021-05, Vol.78, p.25-35 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization process. This paper proposes a high-performance polynomial ring processor for the CRYSTALS-Kyber algorithm. The processor executes optimized polynomial ring arithmetic, which cuts off over 20%/50% on the times of modular multiplication/addition compared with the straightforward implementations. Besides, the forward and inverse Number Theoretic Transform (NTT) reuse the control logic with the help of an efficient configurable butterfly unit to minimize the area of the finite state machine. Further, the underlying dual-column sequential storage scheme breaks the bottleneck of memory accessing. To evaluate the performance, a fully pipelined architecture is implemented on a low-cost FPGA platform. Benefiting from these optimizations, the Kyber1024processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1 μs, and it achieves speedup by a factor of 2.1 compared with the state-of-the-art implementation.
•Kyber1024 processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1μs on a low-cost FPGA.•Saving more than 20% on the times of modular multiplication operations in polynomial ring arithmetic.•Optimized NTT signal flow reuses the loop control logic to save nearly 50% of the resource.•Dual-column sequential storage improves memory bandwidth.•Configurable butterfly unit supports Cooley–Tukey butterfly-based forward NTT, Gentlemen–Sande butterfly-based inverse NTT, and other meta operations. |
---|---|
ISSN: | 0167-9260 1872-7522 |
DOI: | 10.1016/j.vlsi.2020.12.005 |