Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, th...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper evaluates the efficacy of recent commercial processing-in-memory
(PIM) solutions to accelerate fast Fourier transform (FFT), an important
primitive across several domains. Specifically, we observe that efficient
implementations of FFT on modern GPUs are memory bandwidth bound. As such, the
memory bandwidth boost availed by commercial PIM solutions makes a case for PIM
to accelerate FFT. To this end, we first deduce a mapping of FFT computation to
a strawman PIM architecture representative of recent commercial designs. We
observe that even with careful data mapping, PIM is not effective in
accelerating FFT. To address this, we make a case for collaborative
acceleration of FFT with PIM and GPU. Further, we propose software and hardware
innovations which lower PIM operations necessary for a given FFT. Overall, our
optimized PIM FFT mapping, termed Pimacolaba, delivers performance and data
movement savings of up to 1.38$\times$ and 2.76$\times$, respectively, over a
range of FFT sizes. |
---|---|
DOI: | 10.48550/arxiv.2308.03973 |