The Split-Radix Fast Fourier Transform has the same low arithmetic complexity as the related Conjugate Pair Fast Fourier Transform. Both transforms have an irregular datapath structure which is straightforwardly expressed… Click to show full abstract
The Split-Radix Fast Fourier Transform has the same low arithmetic complexity as the related Conjugate Pair Fast Fourier Transform. Both transforms have an irregular datapath structure which is straightforwardly expressed only in recursive forms. Furthermore, the conjugate pair variant has a complicated input indexing pattern which requires existing iterative implementations to rely on precomputed tables. It however allows optimization of the memory bandwidth as it requires a single twiddle factor load per radix-4 butterfly. In existing algorithms, this comes at the cost of using additional precomputed tables or performing recursive function calls. In this paper we present two novel approaches that handle both the butterfly scheduling and the input index generation of the Conjugate Pair Fast Fourier Transform. The proposed algorithm is cache-friendly because it is depth-first, non-recursive and does not rely on precomputed index tables. In order to achieve this, we relate the butterfly execution pattern of the Split-Radix and Conjugate Pair FFTs to the binary carry sequence. Based on this finding, we describe how common integer arithmetic and bitwise operations can be used to perform input reordering and depth-first traversal of the transform datapath with $\mathcal {O}(1)$ space complexity.
               
Click one of the above tabs to view related content.