System and method of controlling neural processing

A system of controlling neural processing based on deep reinforcement learning includes an agent module and an environment module. The agent module generates a plurality of agents based on layers included in a neural network model. Each agent repeatedly performs an iteration to determine a next acti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YIM HAN YOUNG, KIM MIN SEONG, KO SANG SOO, JOO WOO HYUN, HA SANG HYUCK, LEE HYUK JOON
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A system of controlling neural processing based on deep reinforcement learning includes an agent module and an environment module. The agent module generates a plurality of agents based on layers included in a neural network model. Each agent repeatedly performs an iteration to determine a next action among a plurality of candidate actions based on a reward value and a plurality of Q values corresponding to a present action, wherein the candidate actions indicate a change of a tiling condition of an input feature map of each layer. Each agent determines an optimal tiling condition of each layer based on change of the reward value according to repeatedly-performed iterations. The environment module generates the reward value and the plurality of Q values with respect to each layer based on a tiling condition corresponding to the present action, wherein the Q values indicate prediction reward values of the candidate actions. 심층 강화 학습 기반의 뉴럴 프로세싱 제어 시스템은 에이전트 모듈 및 인바이런먼트 모듈을 포함한다. 상기 에이전트 모듈은 뉴럴 네트워크 모델에 포함되는 복수의 레이어들에 기초하여 복수의 에이전트들을 생성한단. 상기 복수의 에이전트들의 각각의 에이전트는 상기 복수의 레이어들의 각각의 레이어의 입력 피쳐 맵의 타일링 조건의 변경을 나타내는 복수의 후보 액션들 중에서 현재의 액션에 상응하는 리워드 값 및 복수의 큐 값들에 기초하여 다음의 액션을 결정하는 이터레이션을 반복적으로 수행하고 상기 이터레이션의 반복에 따른 리워드 값의 변화에 기초하여 상기 각각의 레이어에 대한 최적 타일링 조건을 결정한다. 상기 인바이런먼트 모듈은 상기 각각의 레이어에 대하여 상기 현재의 액션에 상응하는 타일링 조건에 기초하여 상기 리워드 값 및 상기 복수의 후보 액션들의 예측 리워드 값들을 나타내는 상기 복수의 큐 값들을 생성한다.