Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets
To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that compre...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To address increasing compute demand from recent multi-model workloads with
heavy models like large language models, we propose to deploy heterogeneous
chiplet-based multi-chip module (MCM)-based accelerators. We develop an
advanced scheduling framework for heterogeneous MCM accelerators that
comprehensively consider complex heterogeneity and inter-chiplet pipelining.
Our experiments using our framework on GPT-2 and ResNet-50 models on a
4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and
energy efficiency, compared to a monolithic accelerator with an optimized
output-stationary dataflow. |
---|---|
DOI: | 10.48550/arxiv.2312.09401 |