High-performance area-efficient polynomial ring processor for CRYSTALS-Kyber on FPGAs

The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Integration (Amsterdam) 2021-05, Vol.78, p.25-35
Hauptverfasser: Chen, Zhaohui, Ma, Yuan, Chen, Tianyu, Lin, Jingqiang, Jing, Jiwu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization process. This paper proposes a high-performance polynomial ring processor for the CRYSTALS-Kyber algorithm. The processor executes optimized polynomial ring arithmetic, which cuts off over 20%/50% on the times of modular multiplication/addition compared with the straightforward implementations. Besides, the forward and inverse Number Theoretic Transform (NTT) reuse the control logic with the help of an efficient configurable butterfly unit to minimize the area of the finite state machine. Further, the underlying dual-column sequential storage scheme breaks the bottleneck of memory accessing. To evaluate the performance, a fully pipelined architecture is implemented on a low-cost FPGA platform. Benefiting from these optimizations, the Kyber1024processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1 μs, and it achieves speedup by a factor of 2.1 compared with the state-of-the-art implementation. •Kyber1024 processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1μs on a low-cost FPGA.•Saving more than 20% on the times of modular multiplication operations in polynomial ring arithmetic.•Optimized NTT signal flow reuses the loop control logic to save nearly 50% of the resource.•Dual-column sequential storage improves memory bandwidth.•Configurable butterfly unit supports Cooley–Tukey butterfly-based forward NTT, Gentlemen–Sande butterfly-based inverse NTT, and other meta operations.
ISSN:0167-9260
1872-7522
DOI:10.1016/j.vlsi.2020.12.005