FRACTIONAL-BIT QUANTIZATION AND DEPLOYMENT OF CONVOLUTIONAL NEURAL NETWORK MODELS

The disclosure relates to fractional-bit network quantization and deployment of CNN models. An AI accelerator, including: an input buffer configured to buffer an input image; a weight buffer configured to buffer convolutional kernel indexes for a convolutional layer of a CNN model; a kernel pattern...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YAO, Anbang, CHENG, Liang, ZHANG, Yu, CHEN, Yurong, CHEN, Feng, LU, Ming, LIU, Miaoming, YANG, Yi, LIU, Bo, SHEN, Wanglei
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The disclosure relates to fractional-bit network quantization and deployment of CNN models. An AI accelerator, including: an input buffer configured to buffer an input image; a weight buffer configured to buffer convolutional kernel indexes for a convolutional layer of a CNN model; a kernel pattern buffer configured to buffer a 1-bit convolutional kernel subset for the convolutional layer of the CNN model, wherein the 1-bit convolutional kernel subset includes 2 τ 1-bit convolutional kernels with a size of K×K; a PE array including one or more PE nodes, each of which is configured to generate convolutional results of an image region of the input image and 1-bit convolutional kernels corresponding to the convolutional kernel indexes in the 1-bit convolutional kernel subset; and an output buffer configured to buffer convolutional results of respective image regions of the input image and the 1-bit convolutional kernels corresponding to the convolutional kernel indexes. La divulgation concerne la quantification et le déploiement par réseau à bits fractionnaires de modèles de CNN. Accélérateur d'intelligence artificielle (IA), comprenant : une mémoire tampon d'entrée configurée pour mettre en mémoire tampon une image d'entrée ; une mémoire tampon de poids configurée pour mettre en mémoire tampon des indices de noyau de convolution pour une couche de convolution d'un modèle de CNN ; une mémoire tampon de motif de noyau configurée pour mettre en mémoire tampon un sous-ensemble de noyau de convolution de 1 bit pour la couche de convolution du modèle de CNN, le sous-ensemble de noyau de convolution de 1 bit comprenant 2 noyaux de convolution de 1 bit τ présentant une taille de K×K ; un réseau de PE comprenant un ou plusieurs nœuds PE, chacun étant configuré pour générer des résultats de convolution d'une région d'image de l'image d'entrée et des noyaux de convolution de 1 bit correspondant aux indices de noyau de convolution dans le sous-ensemble de noyaux de convolution de 1 bit ; et une mémoire tampon de sortie configurée pour mettre en mémoire tampon des résultats de convolution de régions d'image respectives de l'image d'entrée et des noyaux de convolution de 1 bit correspondant aux indices de noyau de convolution.