ECO-BIKE: Bridging the Gap Between PQC BIKE and GPU Acceleration
Advancements in quantum computing pose a threat to public-key cryptosystems, leading to the development of post-quantum cryptography. NIST is standardizing candidate algorithms, with BIKE, a code-based key encapsulation mechanism, among those under consideration. Performance is crucial in NIST PQC s...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information forensics and security 2024, Vol.19, p.8952-8965 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Advancements in quantum computing pose a threat to public-key cryptosystems, leading to the development of post-quantum cryptography. NIST is standardizing candidate algorithms, with BIKE, a code-based key encapsulation mechanism, among those under consideration. Performance is crucial in NIST PQC standardization process, and researchers have introduced a range of optimization techniques for BIKE across various platforms. To the best of our knowledge, our Efficient CryptOgraphy BIKE (ECO-BIKE) represents the first attempt at optimizing the implementation of BIKE on GPU architecture. In this paper, we introduce a comprehensive construction of a 3-threading parallel architecture tailored for the BIKE cryptosystem. This architecture covers a range of computational tasks, addressing operations from low-level to high-level computations. These include a parallel dense polynomial multiplication scheme with a better memory access pattern and a better XOR calculation, which forms the basis for a comprehensive parallel execution framework for the entire BIKE algorithm. Targeted optimizations are implemented for specific modules (KEYGEN, ENCAPS, DECAPS), which collectively enhance the overall efficiency of the algorithm. Our ECO-BIKE exhibits exceptional throughput performance on the NVIDIA GeForce RTX 4090. In the 3-thread mode, the throughput of the KEYGEN, ENCAPS, and DECAPS modules reaches 24.033 kops/s, 277.789 kops/s, and 5.817 kops/s, respectively. Our proposed optimal parallel multiplication scheme achieves a significantly higher overall throughput of 481.302 kops/s. These results highlight the substantial computational advantages our approach provides for cryptographic workloads. |
---|---|
ISSN: | 1556-6013 1556-6021 |
DOI: | 10.1109/TIFS.2024.3443617 |