Thermal-Aware Design Space Exploration of 3-D Systolic ML Accelerators
Machine learning (ML) accelerators have a broad spectrum of use cases that pose different requirements on accelerator design for latency, energy, and area. In the case of systolic array-based ML accelerators, this puts different constraints on processing element (PE) array dimensions and SRAM buffer...
Gespeichert in:
Veröffentlicht in: | IEEE journal on exploratory solid-state computational devices and circuits 2021-06, Vol.7 (1), p.70-78 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine learning (ML) accelerators have a broad spectrum of use cases that pose different requirements on accelerator design for latency, energy, and area. In the case of systolic array-based ML accelerators, this puts different constraints on processing element (PE) array dimensions and SRAM buffer sizes. The 3-D integration packs more compute or memory in the same 2-D footprint, which can be utilized to build more powerful or energy-efficient accelerators. However, 3-D also expands the design space of ML accelerators by additionally including different possible ways of partitioning the PE array and SRAM buffers among the vertical tiers. Moreover, the partitioning approach may also have different thermal implications. This work provides a systematic framework for performing system-level design space exploration of 3-D systolic accelerators. Using this framework, different 3-D-partitioned accelerator configurations are proposed and evaluated. The 3-D-stacked accelerator designs are modeled using the hybrid wafer bonding technique with a 1.44- \mu \text{m} pitch of 3-D connection. Results show that different partitioning of the systolic array and SRAM buffers in a four-tier 3-D configuration can lead to either 1.1- 3.9\times latency reduction or 1- 3\times energy reduction compared to the baseline design of the same 2-D area footprint. It is also shown that by carefully organizing the systolic array and SRAM tiers using logic over memory, the temperature rise with 3-D across benchmarks can be limited to 6 °C. |
---|---|
ISSN: | 2329-9231 2329-9231 |
DOI: | 10.1109/JXCDC.2021.3092436 |