LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Accelerating Falcon on ARMv8

Photo by efekurnaz from unsplash

Falcon is one of the promising digital-signature algorithms in NIST’s ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization.… Click to show full abstract

Falcon is one of the promising digital-signature algorithms in NIST’s ongoing Post-Quantum Cryptography (PQC) standardization finalist. Computational efficiency regarding software and hardware is also the main criteria for PQC standardization. In this paper, we present an efficient Falcon software implementation on ARMv8 environment. Until now, most of the software optimization on PQC algorithms have been conducted on 32-bit ARM (Cortex-M4) and typical CPUs (Intel and AMD CPUs). However, ARMv8 including Cortex-A30, 50, and 70 series have been widely used for various IoT (Internet of Things) applications, Edge computing devices, and OBUs (On Board Units) in autonomous driving cars. For optimizing the performance of Falcon, we take full advantage of NEON engine which is a kind of parallel processing unit in ARMv8 MCU. The main computation in Falcon belongs to polynomial multiplications in Complex number domain and Integer domain. Typically, FFT (Fast Fourier Transformation)-based multiplication method and NTT (Number Theoriteic Transform)-based multiplication method have been widely used for efficient polynomial multiplications in Complex number domain and Integer domain, respectively. Thus, in order to enhance the overall performance of Falcon, we improve the FFT-based multiplication method and NTT-based multiplication method by utilizing NEON engine in ARMv8. Specifically, we parallelize the overall process (FFT/NTT transformation, pointwise multiplication, and inverse FFT/NTT transformation) of FFT-based polynomial multiplication method and NTT-based polynomial multiplication method with strategically utilizing the NEON engine and vector instructions. Furthermore, we minimize the number of redundant memory accesses during FFT/NTT-based polynomial multiplication by making the most of available registers in NEON engine. Through the proposed parallel FFT/NTT-based polynomial multiplications, the proposed Falcon software provides 15.1% (resp. 18.1%), 16.5% (resp. 17.1%), and 65.4% (resp. 69.4%) of performance improvement in keypair generation, signing, and verification at security level 1 (resp. 5) compared with the reference Falcon implementation submitted to the final round of NIST PQC competition. Furthermore, as far as we know, this is the first optimized implementation of Falcon on ARMv8 environment.

Keywords: neon engine; multiplication method; armv8; falcon; multiplication

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.