GPU BLAS GPU-BASED ADAPTIVE BLAS OPERATION ACCELERATION APPARATUS AND METHOD THEREOF

Disclosed are a device for accelerating adaptive basic linear algebra subprogram (BLAS) operation based on a GPU which is capable of accelerating machine learning in an embedded system and a method thereof. According to the present invention, the device for accelerating adaptive BLAS operation based...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HONG SEUNG TAE, SEOL JIN HO, KIM YOUNG JOO, KIM JEONG SI
Format: Patent
Sprache:eng ; kor
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed are a device for accelerating adaptive basic linear algebra subprogram (BLAS) operation based on a GPU which is capable of accelerating machine learning in an embedded system and a method thereof. According to the present invention, the device for accelerating adaptive BLAS operation based on a GPU comprises: a BLAS operation acceleration unit using machine learning data feature information and open computing language (OpenCL) device information to set an optimal OpenCL parameter and compiling a kernel source code to generate a binary kernel; an OpenCL execution unit using OpenCL execution environment information and the optimal OpenCL parameter to generate an OpenCL buffer for BLAS operation and allowing a GPU accessible to the generated OpenCL buffer to perform the BLAS operation through the kernel to accelerate machine learning in an embedded system; and an acceleration device application unit returning an execution result of the BLAS operation to a machine learning algorithm. GPU 기반의 적응적 BLAS 연산 가속화 장치 및 방법이 개시된다. 본 발명에 따른 GPU 기반의 적응적 BLAS 연산 가속화 장치는, 기계학습 데이터 특징 정보 및 OpenCL 장치 정보를 이용하여, 최적 OpenCL 파라미터를 설정하고, 커널 소스코드를 컴파일하여 바이너리 형태의 커널을 생성하는 BLAS 연산 가속화부, OpenCL 실행 환경 정보 및 상기 최적 OpenCL 파라미터를 이용하여 BLAS 연산을 위한 OpenCL 버퍼를 생성하고, 생성된 상기 OpenCL 버퍼에 접근 가능한 GPU가 커널을 통해 상기 BLAS 연산을 수행하여 임베디드 시스템에서의 기계학습을 가속화하는 OpenCL 실행부, 그리고 상기 BLAS 연산의 수행 결과를 기계학습 알고리즘에 반환하는 가속 장치 응용부를 포함한다.