Minimizing development and maintenance costs in supporting persistently optimized BLAS

The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance‐critical APIs in scientific computing today. It has long been understood that the most important of these routines, the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software, practice & experience practice & experience, 2005-02, Vol.35 (2), p.101-121
Hauptverfasser: Whaley, R. Clint, Petitet, Antoine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance‐critical APIs in scientific computing today. It has long been understood that the most important of these routines, the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix multiply routine. In this paper, however, we show that an even larger set of operations can be efficiently maintained using a much simpler matrix multiply kernel. Indeed, this is how our own project, ATLAS (which provides one of the most widely used BLAS implementations in use today), supports a large variety of performance‐critical routines. Copyright © 2004 John Wiley & Sons, Ltd.
ISSN:0038-0644
1097-024X
DOI:10.1002/spe.626