Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ETRI journal 2008-02, Vol.30 (4), p.576-586
Hauptverfasser: Oh, Jaeg-Eun, Hwang, Seok-Joong, Nguyen, Huong Giang, Kim, A-Reum, Kim, Seon-Wook, Kim, Chul-Woo, Kim, Jong-Kook
Format: Artikel
Sprache:kor
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 586
container_issue 4
container_start_page 576
container_title ETRI journal
container_volume 30
creator Oh, Jaeg-Eun
Hwang, Seok-Joong
Nguyen, Huong Giang
Kim, A-Reum
Kim, Seon-Wook
Kim, Chul-Woo
Kim, Jong-Kook
description In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.
format Article
fullrecord <record><control><sourceid>kyobo_kisti</sourceid><recordid>TN_cdi_kisti_ndsl_JAKO200871242947537</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4010022974157</sourcerecordid><originalsourceid>FETCH-LOGICAL-k607-617c2701b6f6b974dc907e009da74ee9b5ca94ec7b186d014c11cb7d213ea1aa3</originalsourceid><addsrcrecordid>eNpNjclqwzAUAE1poSHNP-jSo0FPkvXiY0jT1ZBAfTdaXlph1TaRU5K_r7sceprLMHORzYSQMkcp9GU2AyGKXCstr7NFSsHyAgBQLHGW-c1piH0YQ_fG6vcDGZ9X9EmR7czBxEgxpA8WOlb1rk0jDWxzInccQ98xe_6WxjBpZ3Z3HGJw5qdj2OuESGwXhinQ0U12tTcx0eKP86y-39Trx7zaPjytV1Xeao65BnQCOVi917ZE5V3JkTgvvUFFVNrCmVKRQwtL7TkoB-AsegGSDBgj59ntb7YNaQxN51NsnlcvW8H5EkEoUSosJP7zzr3tG9v3raNupEOjOHAuxDSHAuUXqFhe4Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline</title><source>Wiley Free Content</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Oh, Jaeg-Eun ; Hwang, Seok-Joong ; Nguyen, Huong Giang ; Kim, A-Reum ; Kim, Seon-Wook ; Kim, Chul-Woo ; Kim, Jong-Kook</creator><creatorcontrib>Oh, Jaeg-Eun ; Hwang, Seok-Joong ; Nguyen, Huong Giang ; Kim, A-Reum ; Kim, Seon-Wook ; Kim, Chul-Woo ; Kim, Jong-Kook</creatorcontrib><description>In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.</description><identifier>ISSN: 1225-6463</identifier><identifier>EISSN: 2233-7326</identifier><language>kor</language><publisher>한국전자통신연구원</publisher><ispartof>ETRI journal, 2008-02, Vol.30 (4), p.576-586</ispartof><rights>COPYRIGHT(C) KYOBO BOOK CENTRE ALL RIGHTS RESERVED</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881</link.rule.ids></links><search><creatorcontrib>Oh, Jaeg-Eun</creatorcontrib><creatorcontrib>Hwang, Seok-Joong</creatorcontrib><creatorcontrib>Nguyen, Huong Giang</creatorcontrib><creatorcontrib>Kim, A-Reum</creatorcontrib><creatorcontrib>Kim, Seon-Wook</creatorcontrib><creatorcontrib>Kim, Chul-Woo</creatorcontrib><creatorcontrib>Kim, Jong-Kook</creatorcontrib><title>Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline</title><title>ETRI journal</title><addtitle>ETRI journal</addtitle><description>In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.</description><issn>1225-6463</issn><issn>2233-7326</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>JDI</sourceid><recordid>eNpNjclqwzAUAE1poSHNP-jSo0FPkvXiY0jT1ZBAfTdaXlph1TaRU5K_r7sceprLMHORzYSQMkcp9GU2AyGKXCstr7NFSsHyAgBQLHGW-c1piH0YQ_fG6vcDGZ9X9EmR7czBxEgxpA8WOlb1rk0jDWxzInccQ98xe_6WxjBpZ3Z3HGJw5qdj2OuESGwXhinQ0U12tTcx0eKP86y-39Trx7zaPjytV1Xeao65BnQCOVi917ZE5V3JkTgvvUFFVNrCmVKRQwtL7TkoB-AsegGSDBgj59ntb7YNaQxN51NsnlcvW8H5EkEoUSosJP7zzr3tG9v3raNupEOjOHAuxDSHAuUXqFhe4Q</recordid><startdate>20080204</startdate><enddate>20080204</enddate><creator>Oh, Jaeg-Eun</creator><creator>Hwang, Seok-Joong</creator><creator>Nguyen, Huong Giang</creator><creator>Kim, A-Reum</creator><creator>Kim, Seon-Wook</creator><creator>Kim, Chul-Woo</creator><creator>Kim, Jong-Kook</creator><general>한국전자통신연구원</general><general>ETRI</general><scope>P5Y</scope><scope>SSSTE</scope><scope>JDI</scope></search><sort><creationdate>20080204</creationdate><title>Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline</title><author>Oh, Jaeg-Eun ; Hwang, Seok-Joong ; Nguyen, Huong Giang ; Kim, A-Reum ; Kim, Seon-Wook ; Kim, Chul-Woo ; Kim, Jong-Kook</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-k607-617c2701b6f6b974dc907e009da74ee9b5ca94ec7b186d014c11cb7d213ea1aa3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>kor</language><creationdate>2008</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Oh, Jaeg-Eun</creatorcontrib><creatorcontrib>Hwang, Seok-Joong</creatorcontrib><creatorcontrib>Nguyen, Huong Giang</creatorcontrib><creatorcontrib>Kim, A-Reum</creatorcontrib><creatorcontrib>Kim, Seon-Wook</creatorcontrib><creatorcontrib>Kim, Chul-Woo</creatorcontrib><creatorcontrib>Kim, Jong-Kook</creatorcontrib><collection>Kyobo Scholar (교보스콜라)</collection><collection>Scholar(스콜라)</collection><collection>KoreaScience</collection><jtitle>ETRI journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Oh, Jaeg-Eun</au><au>Hwang, Seok-Joong</au><au>Nguyen, Huong Giang</au><au>Kim, A-Reum</au><au>Kim, Seon-Wook</au><au>Kim, Chul-Woo</au><au>Kim, Jong-Kook</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline</atitle><jtitle>ETRI journal</jtitle><addtitle>ETRI journal</addtitle><date>2008-02-04</date><risdate>2008</risdate><volume>30</volume><issue>4</issue><spage>576</spage><epage>586</epage><pages>576-586</pages><issn>1225-6463</issn><eissn>2233-7326</eissn><abstract>In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.</abstract><pub>한국전자통신연구원</pub><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1225-6463
ispartof ETRI journal, 2008-02, Vol.30 (4), p.576-586
issn 1225-6463
2233-7326
language kor
recordid cdi_kisti_ndsl_JAKO200871242947537
source Wiley Free Content; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
title Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T09%3A01%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-kyobo_kisti&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Thread-Level%20Parallelism%20in%20Lockstep%20Execution%20by%20Partially%20Duplicating%20a%20Single%20Pipeline&rft.jtitle=ETRI%20journal&rft.au=Oh,%20Jaeg-Eun&rft.date=2008-02-04&rft.volume=30&rft.issue=4&rft.spage=576&rft.epage=586&rft.pages=576-586&rft.issn=1225-6463&rft.eissn=2233-7326&rft_id=info:doi/&rft_dat=%3Ckyobo_kisti%3E4010022974157%3C/kyobo_kisti%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true