Automatic Tuning Technique Exploring Within the Hardware-Specific Constrained Parameters
This paper covers an efficient strategy for exploring the sampling parameters on auto-tuning processes. Byte/flop is considered as a performance indicator, and finding the best parameter is interpreted as an optimisation problem with some hardware-specific constrained conditions. In this work, we al...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper covers an efficient strategy for exploring the sampling parameters on auto-tuning processes. Byte/flop is considered as a performance indicator, and finding the best parameter is interpreted as an optimisation problem with some hardware-specific constrained conditions. In this work, we also evaluate the performance of various unrolled loops both in a rank-update operation and a matrix-vector multiplication which appear in a significant operation of an eigensolver. The tuned routines running on a single processor of a Hitachi SR8000 and a Fujitsu VPP5000 record 1080 MFLOPS and 8342 MFLOPS respectively. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/11666806_47 |