Programming Many-Core Systems with GRAMPS

The era of obtaining increased performance via faster single cores and optimized single-thread programs is over. Instead, a major factor in new processors' performance comes from parallelism: increasing numbers of cores per processor and threads per core. At the same time, highly parallel GPU c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Sugerman, Jeremy
Format: Report
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The era of obtaining increased performance via faster single cores and optimized single-thread programs is over. Instead, a major factor in new processors' performance comes from parallelism: increasing numbers of cores per processor and threads per core. At the same time, highly parallel GPU cores, initially developed for shading are increasingly being adopted to offload and augment conventional CPUs, and vendors are already discussing chips that combine CPU and GPU cores. These trends are leading towards heterogeneous, commodity, many-core platforms with excellent potential performance, but also (not-so-excellent) significant actual complexity. In both research and industry run-time systems, domain-specific languages, and more generally, parallel programming models, have become the tools to realize this performance and contain this complexity. In this dissertation, we present GRAMPS, a programming model for these heterogeneous commodity, many-core systems that expresses programs as graphs of thread- and data-parallel stages communicating via queues. We validate its viability with respect to four design goals broad application scope, multi-platform applicability performance, and tunability and demonstrate its effectiveness at minimizing the memory consumed by the queues. Through three case studies, we show applications for GRAMPS from domains including interactive graphics, MapReduce, physical simulation, and image processing and describe GRAMPS runtimes for three many-core platforms: two simulated future rendering platforms and one current multi-core x86 machine. Our GRAMPS runtimes efficiently recognize and exploit the available parallelism while containing the footprint/ buffering required by the queues.