Memory-constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms
The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements of digital signal processing (DSP) applications. Moreover,...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The increasing use of heterogeneous embedded systems with multi-core CPUs and
Graphics Processing Units (GPUs) presents important challenges in effectively
exploiting pipeline, task and data-level parallelism to meet throughput
requirements of digital signal processing (DSP) applications. Moreover, in the
presence of system-level memory constraints, hand optimization of code to
satisfy these requirements is inefficient and error-prone, and can therefore,
greatly slow down development time or result in highly underutilized processing
resources. In this paper, we present vectorization and scheduling methods to
effectively exploit multiple forms of parallelism for throughput optimization
on hybrid CPU-GPU platforms, while conforming to system-level memory
constraints. The methods operate on synchronous dataflow representations, which
are widely used in the design of embedded systems for signal and information
processing. We show that our novel methods can significantly improve system
throughput compared to previous vectorization and scheduling approaches under
the same memory constraints. In addition, we present a practical case-study of
applying our methods to significantly improve the throughput of an orthogonal
frequency division multiplexing (OFDM) receiver system for wireless
communications. |
---|---|
DOI: | 10.48550/arxiv.1711.11154 |