MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation
Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Hierarchical matrix approximations have gained significant traction in the
machine learning and scientific community as they exploit available low-rank
structures in kernel methods to compress the kernel matrix. The resulting
compressed matrix, HMatrix, is used to reduce the computational complexity of
operations such as HMatrix-matrix multiplications with tuneable accuracy in an
evaluation phase. Existing implementations of HMatrix evaluations do not
preserve locality and often lead to unbalanced parallel execution with high
synchronization. Also, current solutions require the compression phase to
re-execute if the kernel method or the required accuracy change. In this work,
we describe MatRox, a framework that uses novel structure analysis strategies,
blocking and coarsen, with code specialization and a storage format to improve
locality and create load-balanced parallel tasks for HMatrix-matrix
multiplications. Modularization of the matrix compression phase enables the
reuse of computations when there are changes to the input accuracy and the
kernel function. The MatRox-generated code for matrix-matrix multiplication is
2.98x, 1.60x, and 5.98x faster than library implementations available in GOFMM,
SMASH, and STRUMPACK respectively. Additionally, the ability to reuse portions
of the compression computation for changes to the accuracy leads to up to 2.64x
improvement with MatRox over five changes to accuracy using GOFMM. |
---|---|
DOI: | 10.48550/arxiv.1812.07152 |