High-performance FFT algorithms for the Convex C4/XA supercomputer

Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 1995-03, Vol.9 (1-2), p.163-178
Hauptverfasser: Wadleigh, Kevin R., Gostin, Gary B., Liu, John
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Some implementations of a power-of-two one-dimensional fast Fourier transform (FFT) on vector computers use radix-4 Stockham autosort kernels with a separate transpose step. This paper describes an algorithm that performs well on a Convex C4/XA vector supercomputer on large FFTs by using higher-radix kernels and moving the transpose step into the computational steps. For short transforms a different algorithm is used that calculates the FFT without storing any intermediate results to memory. Performance results using these techniques are given.
ISSN:0920-8542
1573-0484
DOI:10.1007/BF01245402