High-Performance ECC Scalar Multiplication Architecture Based on Comb Method and Low-Latency Window Recoding Algorithm
Elliptic curve scalar multiplication (ECSM) is the essential operation in elliptic curve cryptography (ECC) for achieving high performance and security. We introduce a novel high-performance ECSM architecture over binary fields to meet the growing demand for performance and security. A low-latency w...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems 2024-02, Vol.32 (2), p.382-395 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Elliptic curve scalar multiplication (ECSM) is the essential operation in elliptic curve cryptography (ECC) for achieving high performance and security. We introduce a novel high-performance ECSM architecture over binary fields to meet the growing demand for performance and security. A low-latency window (LLW) recoding algorithm for hardware implementation is proposed to enhance the resistance toward side-channel attacks (SCAs). Based on the LLW algorithm, we propose an enhanced comb method for ECSM with a unified point addition (PA) and point doubling (PD) pattern. The theoretical analysis demonstrates that the enhanced comb method with w=4 strikes the balance of computation burden for both extreme cases. To achieve short clock cycle latency and high frequency, the data dependency of ECSM is thoroughly analyzed, and we explore a timing schedule with one two-stage pipelined Karatsuba multiplier accumulator (MAC). The datapath of the proposed architecture is well-designed, ensuring that the critical path (CP) only contains minimal logic primitives apart from the MAC. Besides, the ideal placement of pipeline stages for MAC is illustrated. The proposed architecture has been implemented on Xilinx Virtex-7 series field-programmable gate arrays (FPGAs) and performs ECSM in 2.51, 4.93, and 10.85 ~\mu \text { s} with 3422, 7983, and 20158 slices over \text {GF}(2^{163}) , \text {GF}(2^{283}) , and \text {GF}(2^{571}) , respectively. Implementation results reveal that our design shows 53.60%, 39.36%, and 32.64% performance improvement over the existing state-of-the-art works, respectively. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2023.3321772 |