Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework
We approach the problem of implementing mixed-datatype support within the general matrix multiplication ( gemm ) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A , B , and C may be stored as single- or double-precision real or complex values. Another...
Gespeichert in:
Veröffentlicht in: | ACM transactions on mathematical software 2021-06, Vol.47 (2), p.1-26 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We approach the problem of implementing mixed-datatype support within the general matrix multiplication (
gemm
) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand
A
,
B
, and
C
may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either
A
or
B
, is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations. |
---|---|
ISSN: | 0098-3500 1557-7295 |
DOI: | 10.1145/3402225 |