Accelerating Falcon on ARMv8

Falcon is one of the promising digital-signature algorithms in NIST's ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon softw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022, Vol.10, p.44446-44460
Hauptverfasser: Kim, Youngbeom, Song, Jingyo, Seo, Seog Chung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 44460
container_issue
container_start_page 44446
container_title IEEE access
container_volume 10
creator Kim, Youngbeom
Song, Jingyo
Seo, Seog Chung
description Falcon is one of the promising digital-signature algorithms in NIST's ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon software implementation on ARMv8 environment. Until now, most of the software optimization on PQC algorithms have been conducted on 32-bit ARM (Cortex-M4) and typical CPUs (Intel and AMD CPUs). However, ARMv8 including Cortex-A30, 50, and 70 series have been widely used for various IoT (Internet of Things) applications, Edge computing devices, and OBUs (On Board Units) in autonomous driving cars. For optimizing the performance of Falcon, we take full advantage of NEON engine which is a kind of parallel processing unit in ARMv8 MCU. The main computation in Falcon belongs to polynomial multiplications in Complex number domain and Integer domain. Typically, FFT (Fast Fourier Transformation)-based multiplication method and NTT (Number Theoriteic Transform)-based multiplication method have been widely used for efficient polynomial multiplications in Complex number domain and Integer domain, respectively. Thus, in order to enhance the overall performance of Falcon, we improve the FFT-based multiplication method and NTT-based multiplication method by utilizing NEON engine in ARMv8. Specifically, we parallelize the overall process (FFT/NTT transformation, pointwise multiplication, and inverse FFT/NTT transformation) of FFT-based polynomial multiplication method and NTT-based polynomial multiplication method with strategically utilizing the NEON engine and vector instructions. Furthermore, we minimize the number of redundant memory accesses during FFT/NTT-based polynomial multiplication by making the most of available registers in NEON engine. Through the proposed parallel FFT/NTT-based polynomial multiplications, the proposed Falcon software provides 15.1% (resp. 18.1%), 16.5% (resp. 17.1%), and 65.4% (resp. 69.4%) of performance improvement in keypair generation, signing, and verification at security level 1 (resp. 5) compared with the reference Falcon implementation submitted to the final round of NIST PQC competition. Furthermore, as far as we know, this is the first optimized implementation of Falcon on ARMv8 environment.
doi_str_mv 10.1109/ACCESS.2022.3169784
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9762260</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9762260</ieee_id><doaj_id>oai_doaj_org_article_4967944e970b42a1a1a182c2f3d77b73</doaj_id><sourcerecordid>2659347198</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-685749385a0e952ef669df55fcfcc7ccd87d97304f2f43b3c6cd40f1779c406a3</originalsourceid><addsrcrecordid>eNpNUE1LAzEQDaJgqf0F9VDwvGu-JzkuS6uFimD1HNJsUras3ZrdCv57U7cUZwbmMcx7MzyEpgTnhGD9WJTlfL3OKaY0Z0RqUPwKjWhCGRNMXv_Dt2jSdTucQqWRgBG6L5zzjY-2r_fb2cI2rt3PUhVvL9_qDt0E23R-cu5j9LGYv5fP2er1aVkWq8xxrPpMKgFcMyUs9lpQH6TUVRAiuOAcOFcpqDQwzAMNnG2Yk67iOBAAnQSkZWO0HHSr1u7MIdafNv6Y1tbmb9DGrbGxr13jDdcSNOdeA95waskpFXU0sApgAyxpPQxah9h-HX3Xm117jPv0vqFSaMaBaJW22LDlYtt10YfLVYLNyVUzuGpOrpqzq4k1HVi19_7C0CAplZj9Aoc4b1o</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2659347198</pqid></control><display><type>article</type><title>Accelerating Falcon on ARMv8</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Kim, Youngbeom ; Song, Jingyo ; Seo, Seog Chung</creator><creatorcontrib>Kim, Youngbeom ; Song, Jingyo ; Seo, Seog Chung</creatorcontrib><description>Falcon is one of the promising digital-signature algorithms in NIST's ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon software implementation on ARMv8 environment. Until now, most of the software optimization on PQC algorithms have been conducted on 32-bit ARM (Cortex-M4) and typical CPUs (Intel and AMD CPUs). However, ARMv8 including Cortex-A30, 50, and 70 series have been widely used for various IoT (Internet of Things) applications, Edge computing devices, and OBUs (On Board Units) in autonomous driving cars. For optimizing the performance of Falcon, we take full advantage of NEON engine which is a kind of parallel processing unit in ARMv8 MCU. The main computation in Falcon belongs to polynomial multiplications in Complex number domain and Integer domain. Typically, FFT (Fast Fourier Transformation)-based multiplication method and NTT (Number Theoriteic Transform)-based multiplication method have been widely used for efficient polynomial multiplications in Complex number domain and Integer domain, respectively. Thus, in order to enhance the overall performance of Falcon, we improve the FFT-based multiplication method and NTT-based multiplication method by utilizing NEON engine in ARMv8. Specifically, we parallelize the overall process (FFT/NTT transformation, pointwise multiplication, and inverse FFT/NTT transformation) of FFT-based polynomial multiplication method and NTT-based polynomial multiplication method with strategically utilizing the NEON engine and vector instructions. Furthermore, we minimize the number of redundant memory accesses during FFT/NTT-based polynomial multiplication by making the most of available registers in NEON engine. Through the proposed parallel FFT/NTT-based polynomial multiplications, the proposed Falcon software provides 15.1% (resp. 18.1%), 16.5% (resp. 17.1%), and 65.4% (resp. 69.4%) of performance improvement in keypair generation, signing, and verification at security level 1 (resp. 5) compared with the reference Falcon implementation submitted to the final round of NIST PQC competition. Furthermore, as far as we know, this is the first optimized implementation of Falcon on ARMv8 environment.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3169784</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Addition &amp; subtraction ; Algorithms ; ARM/NEON processor ; ARMv8 ; Central processing units ; Complex numbers ; CPUs ; Cryptography ; Digital signatures ; Domains ; Edge computing ; Engines ; Falcon ; Fast Fourier transformations ; Integers ; Internet of Things ; memory optimization ; Multiplication ; Neon ; NIST PQC signature ; Optimization ; parallel implementation ; Parallel processing ; Polynomials ; Quantum cryptography ; Security ; Software ; Software algorithms ; Standardization</subject><ispartof>IEEE access, 2022, Vol.10, p.44446-44460</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-685749385a0e952ef669df55fcfcc7ccd87d97304f2f43b3c6cd40f1779c406a3</citedby><cites>FETCH-LOGICAL-c408t-685749385a0e952ef669df55fcfcc7ccd87d97304f2f43b3c6cd40f1779c406a3</cites><orcidid>0000-0001-8016-2808 ; 0000-0003-4715-8393</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9762260$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Kim, Youngbeom</creatorcontrib><creatorcontrib>Song, Jingyo</creatorcontrib><creatorcontrib>Seo, Seog Chung</creatorcontrib><title>Accelerating Falcon on ARMv8</title><title>IEEE access</title><addtitle>Access</addtitle><description>Falcon is one of the promising digital-signature algorithms in NIST's ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon software implementation on ARMv8 environment. Until now, most of the software optimization on PQC algorithms have been conducted on 32-bit ARM (Cortex-M4) and typical CPUs (Intel and AMD CPUs). However, ARMv8 including Cortex-A30, 50, and 70 series have been widely used for various IoT (Internet of Things) applications, Edge computing devices, and OBUs (On Board Units) in autonomous driving cars. For optimizing the performance of Falcon, we take full advantage of NEON engine which is a kind of parallel processing unit in ARMv8 MCU. The main computation in Falcon belongs to polynomial multiplications in Complex number domain and Integer domain. Typically, FFT (Fast Fourier Transformation)-based multiplication method and NTT (Number Theoriteic Transform)-based multiplication method have been widely used for efficient polynomial multiplications in Complex number domain and Integer domain, respectively. Thus, in order to enhance the overall performance of Falcon, we improve the FFT-based multiplication method and NTT-based multiplication method by utilizing NEON engine in ARMv8. Specifically, we parallelize the overall process (FFT/NTT transformation, pointwise multiplication, and inverse FFT/NTT transformation) of FFT-based polynomial multiplication method and NTT-based polynomial multiplication method with strategically utilizing the NEON engine and vector instructions. Furthermore, we minimize the number of redundant memory accesses during FFT/NTT-based polynomial multiplication by making the most of available registers in NEON engine. Through the proposed parallel FFT/NTT-based polynomial multiplications, the proposed Falcon software provides 15.1% (resp. 18.1%), 16.5% (resp. 17.1%), and 65.4% (resp. 69.4%) of performance improvement in keypair generation, signing, and verification at security level 1 (resp. 5) compared with the reference Falcon implementation submitted to the final round of NIST PQC competition. Furthermore, as far as we know, this is the first optimized implementation of Falcon on ARMv8 environment.</description><subject>Addition &amp; subtraction</subject><subject>Algorithms</subject><subject>ARM/NEON processor</subject><subject>ARMv8</subject><subject>Central processing units</subject><subject>Complex numbers</subject><subject>CPUs</subject><subject>Cryptography</subject><subject>Digital signatures</subject><subject>Domains</subject><subject>Edge computing</subject><subject>Engines</subject><subject>Falcon</subject><subject>Fast Fourier transformations</subject><subject>Integers</subject><subject>Internet of Things</subject><subject>memory optimization</subject><subject>Multiplication</subject><subject>Neon</subject><subject>NIST PQC signature</subject><subject>Optimization</subject><subject>parallel implementation</subject><subject>Parallel processing</subject><subject>Polynomials</subject><subject>Quantum cryptography</subject><subject>Security</subject><subject>Software</subject><subject>Software algorithms</subject><subject>Standardization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1LAzEQDaJgqf0F9VDwvGu-JzkuS6uFimD1HNJsUras3ZrdCv57U7cUZwbmMcx7MzyEpgTnhGD9WJTlfL3OKaY0Z0RqUPwKjWhCGRNMXv_Dt2jSdTucQqWRgBG6L5zzjY-2r_fb2cI2rt3PUhVvL9_qDt0E23R-cu5j9LGYv5fP2er1aVkWq8xxrPpMKgFcMyUs9lpQH6TUVRAiuOAcOFcpqDQwzAMNnG2Yk67iOBAAnQSkZWO0HHSr1u7MIdafNv6Y1tbmb9DGrbGxr13jDdcSNOdeA95waskpFXU0sApgAyxpPQxah9h-HX3Xm117jPv0vqFSaMaBaJW22LDlYtt10YfLVYLNyVUzuGpOrpqzq4k1HVi19_7C0CAplZj9Aoc4b1o</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Kim, Youngbeom</creator><creator>Song, Jingyo</creator><creator>Seo, Seog Chung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8016-2808</orcidid><orcidid>https://orcid.org/0000-0003-4715-8393</orcidid></search><sort><creationdate>2022</creationdate><title>Accelerating Falcon on ARMv8</title><author>Kim, Youngbeom ; Song, Jingyo ; Seo, Seog Chung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-685749385a0e952ef669df55fcfcc7ccd87d97304f2f43b3c6cd40f1779c406a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Addition &amp; subtraction</topic><topic>Algorithms</topic><topic>ARM/NEON processor</topic><topic>ARMv8</topic><topic>Central processing units</topic><topic>Complex numbers</topic><topic>CPUs</topic><topic>Cryptography</topic><topic>Digital signatures</topic><topic>Domains</topic><topic>Edge computing</topic><topic>Engines</topic><topic>Falcon</topic><topic>Fast Fourier transformations</topic><topic>Integers</topic><topic>Internet of Things</topic><topic>memory optimization</topic><topic>Multiplication</topic><topic>Neon</topic><topic>NIST PQC signature</topic><topic>Optimization</topic><topic>parallel implementation</topic><topic>Parallel processing</topic><topic>Polynomials</topic><topic>Quantum cryptography</topic><topic>Security</topic><topic>Software</topic><topic>Software algorithms</topic><topic>Standardization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Youngbeom</creatorcontrib><creatorcontrib>Song, Jingyo</creatorcontrib><creatorcontrib>Seo, Seog Chung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Youngbeom</au><au>Song, Jingyo</au><au>Seo, Seog Chung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating Falcon on ARMv8</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>44446</spage><epage>44460</epage><pages>44446-44460</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Falcon is one of the promising digital-signature algorithms in NIST's ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon software implementation on ARMv8 environment. Until now, most of the software optimization on PQC algorithms have been conducted on 32-bit ARM (Cortex-M4) and typical CPUs (Intel and AMD CPUs). However, ARMv8 including Cortex-A30, 50, and 70 series have been widely used for various IoT (Internet of Things) applications, Edge computing devices, and OBUs (On Board Units) in autonomous driving cars. For optimizing the performance of Falcon, we take full advantage of NEON engine which is a kind of parallel processing unit in ARMv8 MCU. The main computation in Falcon belongs to polynomial multiplications in Complex number domain and Integer domain. Typically, FFT (Fast Fourier Transformation)-based multiplication method and NTT (Number Theoriteic Transform)-based multiplication method have been widely used for efficient polynomial multiplications in Complex number domain and Integer domain, respectively. Thus, in order to enhance the overall performance of Falcon, we improve the FFT-based multiplication method and NTT-based multiplication method by utilizing NEON engine in ARMv8. Specifically, we parallelize the overall process (FFT/NTT transformation, pointwise multiplication, and inverse FFT/NTT transformation) of FFT-based polynomial multiplication method and NTT-based polynomial multiplication method with strategically utilizing the NEON engine and vector instructions. Furthermore, we minimize the number of redundant memory accesses during FFT/NTT-based polynomial multiplication by making the most of available registers in NEON engine. Through the proposed parallel FFT/NTT-based polynomial multiplications, the proposed Falcon software provides 15.1% (resp. 18.1%), 16.5% (resp. 17.1%), and 65.4% (resp. 69.4%) of performance improvement in keypair generation, signing, and verification at security level 1 (resp. 5) compared with the reference Falcon implementation submitted to the final round of NIST PQC competition. Furthermore, as far as we know, this is the first optimized implementation of Falcon on ARMv8 environment.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3169784</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-8016-2808</orcidid><orcidid>https://orcid.org/0000-0003-4715-8393</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2022, Vol.10, p.44446-44460
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_9762260
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Addition & subtraction
Algorithms
ARM/NEON processor
ARMv8
Central processing units
Complex numbers
CPUs
Cryptography
Digital signatures
Domains
Edge computing
Engines
Falcon
Fast Fourier transformations
Integers
Internet of Things
memory optimization
Multiplication
Neon
NIST PQC signature
Optimization
parallel implementation
Parallel processing
Polynomials
Quantum cryptography
Security
Software
Software algorithms
Standardization
title Accelerating Falcon on ARMv8
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T16%3A46%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20Falcon%20on%20ARMv8&rft.jtitle=IEEE%20access&rft.au=Kim,%20Youngbeom&rft.date=2022&rft.volume=10&rft.spage=44446&rft.epage=44460&rft.pages=44446-44460&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3169784&rft_dat=%3Cproquest_ieee_%3E2659347198%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2659347198&rft_id=info:pmid/&rft_ieee_id=9762260&rft_doaj_id=oai_doaj_org_article_4967944e970b42a1a1a182c2f3d77b73&rfr_iscdi=true