Superscalar GEMM-based level 3 BLAS—The on-going evolution of a portable and high-performance library
Recently, a first version of our GEMM-based level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cac...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, a first version of our GEMM-based level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cache blocking and data copying of submatrix operands for the level 3 kernel. Our other BLAS’s which possess triangular operands, e.g., DTRSM, DSYRK use a similar level 3 kernel routine to handle the triangular blocks that appear on the diagonal of the larger input triangular operand. Like our previous GEMM-based work all other BLAS’s perform the dominating part of the computations in calls to DGEMM. We are seeing the adoption of our BLAS’s by several organizations, including the ATLAS and PHiPAC projects on automatic generation of fast DGEMM kernels for superscalar processors, and some computer vendors. The evolution of the superscalar GEMM-based level 3 BLAS is presented. Also, we describe new developments which include techniques that make the library applicable to symmetric multiprocessing (SMP) systems. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/BFb0095338 |