High-performance FFT algorithms for the Convex C4/XA supercomputer
Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radi...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 1995-03, Vol.9 (1-2), p.163-178 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radix kernels and moving the transpose step into the computational steps. For short transforms a different algorithm is used that calculates the FFT without storing any intermediate results to memory. Performance results using these techniques are given. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/BF01245402 |