Exploiting Thread‐Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler‐hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ETRI journal 2008, 30(4), , pp.576-586
Hauptverfasser:	Oh, Jaegeun, Hwang, Seok Joong, Nguyen, Huong Giang, Kim, Areum, Kim, Seon Wook, Kim, Chulwoo, Kim, Jong‐Kook
Format:	Artikel
Sprache:	eng
Schlagworte:	CMP ILP MLEP SMT TLP 전자/정보통신공학
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler‐hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write‐back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single‐instruction multiple‐data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32‐bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2‐way MLEP and 33.7% faster with a 4‐way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.
ISSN:	1225-6463 2233-7326
DOI:	10.4218/etrij.08.0107.0343