Tailor: Altering Skip Connections for Resource-Efficient Inference

Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this article, we show that skip connections can be optimized fo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on reconfigurable technology and systems 2024-03, Vol.17 (1), p.1-23, Article 11
Hauptverfasser:	Weng, Olivia, Marcano, Gabriel, Loncar, Vladimir, Khodamoradi, Alireza, G, Abarajithan, Sheybani, Nojan, Meza, Andres, Koushanfar, Farinaz, Denolf, Kristof, Duarte, Javier Mauricio, Kastner, Ryan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer systems organization Hardware Hardware-software codesign Neural networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this article, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network’s skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware-efficient implementation with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network’s skip connections to lower the hardware cost. Tailor improves resource utilization by up to 34% for block random access memories (BRAMs), 13% for flip-flops (FFs), and 16% for look-up tables (LUTs) for on-chip, dataflow-style architectures. Tailor increases performance by 30% and reduces memory bandwidth by 45% for a two-dimensional processing element array architecture.
ISSN:	1936-7406 1936-7414
DOI:	10.1145/3624990