Value State Flow Graph: A Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial Hardware

Although custom (and reconfigurable) computing can provide orders-of-magnitude improvements in energy efficiency and performance for many numeric, data-parallel applications, performance on nonnumeric, sequential code is often worse than conventional superscalar processors. This work attempts to imp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on reconfigurable technology and systems 2016-02, Vol.9 (2), p.1-22
Hauptverfasser:	Zaidi, Ali Mustafa, Greaves, David
Format:	Artikel
Sprache:	eng
Schlagworte:	Chains Compilers Energy management Exposure Flow graphs Hardware Mathematical models Processors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Although custom (and reconfigurable) computing can provide orders-of-magnitude improvements in energy efficiency and performance for many numeric, data-parallel applications, performance on nonnumeric, sequential code is often worse than conventional superscalar processors. This work attempts to improve sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model and (b) developing a new compiler IR for high-level synthesis—the value state flow graph (VSFG)—that enables aggressive exposition of ILP even in the presence of complex control flow. Compared to existing control-data flow graph (CDFG)-based IRs, the VSFG exposes more instruction-level parallelism from control-intensive sequential code by exploiting aggressive speculation, enabling control dependence analysis, as well as execution along multiple flows of control. This new IR is directly implemented as a static-dataflow graph in hardware by our prototype high-level synthesis tool chain and shows an average speedup of 1.13× over equivalent hardware generated using LegUp, an existing CDFG-based HLS tool. Furthermore, the VSFG allows us to further trade area and energy for performance through loop unrolling, increasing the average speedup to 1.55×, with a peak speedup of 4.05×. Our VSFG-based hardware approaches the sequential cycle counts of an Intel Nehalem Core i7 processor while consuming only 0.25× the energy of an in-order Altera Nios II f processor.
ISSN:	1936-7406 1936-7414
DOI:	10.1145/2807702