Scientific Application Demands on a Reconfigurable Functional Unit Interface

Modern scientific applications are large, complex, and highly parallel they are commonly executed on supercomputers with tens of thousands of processors. Yet these applications still commonly require weeks or even months to execute. Thus, single-thread performance remains a concern for highly parall...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on reconfigurable technology and systems 2011-05, Vol.4 (2), p.1-30
Hauptverfasser:	Rupnow, Kyle, Underwood, Keith D., Compton, Katherine
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Modern scientific applications are large, complex, and highly parallel they are commonly executed on supercomputers with tens of thousands of processors. Yet these applications still commonly require weeks or even months to execute. Thus, single-thread performance remains a concern for highly parallel scientific applications. Adding a reconfigurable accelerator to each CPU could improve system performance; however, scientific applications have design constraints that differ from most application domains commonly accelerated by reconfigurable logic. In this article, we discuss the constraints imposed by scientific applications on the computation model, the accelerator architecture, and the accelerator’s communication interface with the CPU. Based on these constraints and application analysis, we have previously proposed adding a Reconfigurable Functional Unit (RFU) to accelerate integer graphs that calculate complex memory addresses. In this work, we now propose a flexible multi-instruction interface technique that allows dataflow graphs implemented on the RFU to access a large number of inputs and outputs with minor CPU datapath modifications. We present an in-depth examination of the performance effects of different communication interfaces that use this technique, and select one that best matches the needs of Sandia’s scientific applications. Although RFU execution overall improves performance, we also isolate two key negative performance effects introduced by aggregating CPU instructions into dataflow graphs: delayed issue and graph serialization. Finally, to demonstrate the marketability of an RFU beyond scientific applications, we reanalyze the proposed interfaces using the SPEC-fp benchmark suite. We show that although choosing an interface based on SPEC-fp needs is detrimental to Sandia application performance, choosing an interface based on Sandia demands works well for more general-purpose applications.
ISSN:	1936-7406 1936-7414
DOI:	10.1145/1968502.1968510