MGen: A Framework for Energy-Efficient In-ReRAM Acceleration of Multi-Task BERT

Recently, multiple transformer models, such as BERT, have been utilized together to support multiple natural language processing (NLP) tasks in a system, also known as multi-task BERT. Multi-task BERT with very high weight parameters increases the area requirement of a processing in resistive memory...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2023-11, Vol.72 (11), p.1-13
Hauptverfasser: Kang, Myeonggu, Shin, Hyein, Kim, Junkyum, Kim, Lee-Sup
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, multiple transformer models, such as BERT, have been utilized together to support multiple natural language processing (NLP) tasks in a system, also known as multi-task BERT. Multi-task BERT with very high weight parameters increases the area requirement of a processing in resistive memory (ReRAM) architecture, and several works have attempted to address this model size issue. Despite the reduced parameters, the number of multi-task BERT computations remains the same, leading to massive energy consumption in ReRAM-based deep neural network (DNN) accelerators. Therefore, we suggest a framework for better energy efficiency during the ReRAM acceleration of multi-task BERT. First, we analyze the inherent redundancies of multi-task BERT and the computational properties of the ReRAM-based DNN accelerator, after which we propose what is termed the model generator, which produces optimal BERT models supporting multiple tasks. The model generator reduces multi-task BERT computations while maintaining the algorithmic performance. Furthermore, we present task scheduler, which adjusts the execution order of multiple tasks, to run the produced models efficiently. As a result, the proposed framework achieves maximally 4.4× higher energy efficiency over the baseline, and it can also be combined with the previous multi-task BERT works to achieve both a smaller area and higher energy efficiency.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2023.3288749