Adaptive block size for dense QR factorization in hybrid CPU–GPU systems via statistical modeling

•A CPU–GPU QR factorization is proposed to adaptively adjust block size.•Our statistical auto-tuning procedure can find near optimal block size.•The proposed online monitor can detect and avoid performance oscillations.•The schemes are efficient and outperform existed methods. QR factorization is a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing 2014-05, Vol.40 (5-6), p.70-85
Hauptverfasser: Chen, Ray-Bing, Tsai, Yaohung M., Wang, Weichung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A CPU–GPU QR factorization is proposed to adaptively adjust block size.•Our statistical auto-tuning procedure can find near optimal block size.•The proposed online monitor can detect and avoid performance oscillations.•The schemes are efficient and outperform existed methods. QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a dense QR factorization algorithm with adaptive block sizes on a hybrid system that contains a central processing unit (CPU) and a graphic processing unit (GPU). To maximize the use of CPU and GPU, we develop an adaptive scheme that chooses block size at each iteration. The decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. We modify the highly optimized CPU–GPU based QR factorization in MAGMA to implement the proposed schemes. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2014.03.001