A high performance parallel algorithm for 1-D FFT
In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. We use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. We use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. We show that the multidimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. We implemented this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine. |
---|---|
ISSN: | 1063-9535 |
DOI: | 10.1145/602770.602784 |