SNCL: a supernode OpenCL implementation for hybrid computing arrays

Heterogeneous computing has been developing continuously in the field of high-performance computing because of its high performance and energy efficiency. More and more accelerators have emerged, such as GPU, FPGA, DSP, AI accelerator, and so on. Usually, the accelerator is connected to the host CPU...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2024-05, Vol.80 (7), p.9471-9493
Hauptverfasser: Tang, Tao, Lu, Kai, Peng, Lin, Cui, Yingbo, Fang, Jianbin, Huang, Chun, Wang, Ruibo, Yang, Canqun, Guo, Yifei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Heterogeneous computing has been developing continuously in the field of high-performance computing because of its high performance and energy efficiency. More and more accelerators have emerged, such as GPU, FPGA, DSP, AI accelerator, and so on. Usually, the accelerator is connected to the host CPU as a peripheral device to form a tightly coupled heterogeneous computing node, and then, a parallel system is constructed by multiple nodes. This organization is computationally efficient, but not flexible. When new accelerators appear, it is difficult to join the system that has been built. At the hardware level, we create an array of accelerators and connect them to the existing system through a high-speed network. At the software level, we dynamically organize computing resources from various arrays to build a virtual heterogeneous computing node. This approach also includes a standard programming environment. Therefore, it is a more flexible, elastic, and scalable heterogeneous computing organization. In this paper, a supernode OpenCL implementation is proposed for hybrid parallel computing systems, in which virtual supernodes can be dynamically constructed between different computing arrays, and a standard OpenCL environment is implemented based on RDMA communication of high-speed interconnection, which can be combined with the system-level MPI programming environment, thereby realizing the large-scale parallel computing of the hybrid array. SNCL is compatible with existing MPI/OpenCL programs without the need for additional modifications. Experiments show that the runtime overhead of the supernode OpenCL environment is very low, and it is suitable for deploying applications with high computing density and large data scale between different arrays to utilize their computing power without affecting scalability.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-023-05766-3