Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection

Pre-execution removes the microarchitectural latency of "problem" loads from a programýs critical path by redundantly executing copies of their computations in parallel with the main program. There have been several proposed pre-execution systems, a quantitative framework (PTHSEL) for anal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Petric, Vlad, Roth, Amir
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 333
container_issue
container_start_page 322
container_title
container_volume
creator Petric, Vlad
Roth, Amir
description Pre-execution removes the microarchitectural latency of "problem" loads from a programýs critical path by redundantly executing copies of their computations in parallel with the main program. There have been several proposed pre-execution systems, a quantitative framework (PTHSEL) for analytical pre-execution thread (p-thread) selection, and even a research prototype. To date, however, the energy aspects of pre-execution have not been studied. Cycle-level performance and energy simulations on SPEC2000 integer benchmarks that suffer from L2 misses show that energy-blind pre-execution naturally has a linear latency/energy trade-off, improving performance by 13.8% while increasing energy consumption by 11.9%. To improve this trade-off, we propose two extensions to PTHSEL. First, we replace the flat cycle-for-cycle load cost model with a model based on a critical-path estimation. This extension increases p-thread efficiency in an energy-independent way. Second, we add a parameterized energy model to PTHSEL (forming PTHSEL+E) that allows it to actively select p-threads that reduce energy rather than (or in combination with) execution latency. Experiments show that PTHSEL+E manipulates pre-executionýs latency/energy more effectively. Latency targeted selection benefits from the improved load cost model: its performance improvements grow to an average of 16.4% while energy costs drop to 8.7%. ED targeted selection produces p-threads that improve performance by only 12.9%, but ED by 8.8%. Targeting p-thread selection for energy reduction, results in "energy-free" pre-execution, with average speedup of 5.4%, and a small decrease in total energy consumption (0.7%).
doi_str_mv 10.1109/ISCA.2005.27
format Conference Proceeding
fullrecord <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_acm_books_10_1109_ISCA_2005_27</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1431567</ieee_id><sourcerecordid>31403562</sourcerecordid><originalsourceid>FETCH-LOGICAL-a376t-42f2408978c36f1e74c01f12f2eab6525f66c5a51fbd336b66b1f8d5417fa0aa3</originalsourceid><addsrcrecordid>eNqNkE1PwzAMhiM-JMbYjRuXXuCCMuKkSbbjNBWYNIlJG9JuUdo6UOjakXTA_j2tth-AL5b8Prash5BrYEMANn6YLaeTIWdMDrk-IT0utaQaxPqUXDKtxpJzzdZnpAdMCapGY31BBiF8sLZiCRx4j8ySCv3bnibOYdYU31hhCFHtooVHmvxitmuKuopslUdHcvJjPUYLunr3aPNoiWW3WFdX5NzZMuDg2Pvk9TFZTZ_p_OVpNp3MqRVaNTTmjses_WWUCeUAdZwxcNBO0aZKcumUyqSV4NJcCJUqlYIb5TIG7SyzVvTJ3eHu1tdfOwyN2RQhw7K0Fda7YATETEjFW_DmABaIaLa-2Fi_NxALkEq36e0htdnGpHX9GQww02k1nVbTaTW84-7_w5nUF-jEH5-Qcs0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31403562</pqid></control><display><type>conference_proceeding</type><title>Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Petric, Vlad ; Roth, Amir</creator><creatorcontrib>Petric, Vlad ; Roth, Amir</creatorcontrib><description>Pre-execution removes the microarchitectural latency of "problem" loads from a programýs critical path by redundantly executing copies of their computations in parallel with the main program. There have been several proposed pre-execution systems, a quantitative framework (PTHSEL) for analytical pre-execution thread (p-thread) selection, and even a research prototype. To date, however, the energy aspects of pre-execution have not been studied. Cycle-level performance and energy simulations on SPEC2000 integer benchmarks that suffer from L2 misses show that energy-blind pre-execution naturally has a linear latency/energy trade-off, improving performance by 13.8% while increasing energy consumption by 11.9%. To improve this trade-off, we propose two extensions to PTHSEL. First, we replace the flat cycle-for-cycle load cost model with a model based on a critical-path estimation. This extension increases p-thread efficiency in an energy-independent way. Second, we add a parameterized energy model to PTHSEL (forming PTHSEL+E) that allows it to actively select p-threads that reduce energy rather than (or in combination with) execution latency. Experiments show that PTHSEL+E manipulates pre-executionýs latency/energy more effectively. Latency targeted selection benefits from the improved load cost model: its performance improvements grow to an average of 16.4% while energy costs drop to 8.7%. ED targeted selection produces p-threads that improve performance by only 12.9%, but ED by 8.8%. Targeting p-thread selection for energy reduction, results in "energy-free" pre-execution, with average speedup of 5.4%, and a small decrease in total energy consumption (0.7%).</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 076952270X</identifier><identifier>ISBN: 9780769522708</identifier><identifier>EISSN: 2575-713X</identifier><identifier>DOI: 10.1109/ISCA.2005.27</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Computer aided instruction ; Computer architecture ; Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data ; Computing methodologies -- Modeling and simulation -- Model development and analysis -- Modeling methodologies ; Delay ; Energy consumption ; General and reference -- Cross-computing tools and techniques -- Design ; General and reference -- Cross-computing tools and techniques -- Performance ; Hardware ; Hardware -- Communication hardware, interfaces and storage ; Hardware -- Hardware validation ; Information science ; Multithreading ; Pareto optimization ; Prefetching ; Yarn</subject><ispartof>32nd International Symposium on Computer Architecture (ISCA'05), 2005, p.322-333</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1431567$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1431567$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Petric, Vlad</creatorcontrib><creatorcontrib>Roth, Amir</creatorcontrib><title>Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection</title><title>32nd International Symposium on Computer Architecture (ISCA'05)</title><addtitle>ISCA</addtitle><description>Pre-execution removes the microarchitectural latency of "problem" loads from a programýs critical path by redundantly executing copies of their computations in parallel with the main program. There have been several proposed pre-execution systems, a quantitative framework (PTHSEL) for analytical pre-execution thread (p-thread) selection, and even a research prototype. To date, however, the energy aspects of pre-execution have not been studied. Cycle-level performance and energy simulations on SPEC2000 integer benchmarks that suffer from L2 misses show that energy-blind pre-execution naturally has a linear latency/energy trade-off, improving performance by 13.8% while increasing energy consumption by 11.9%. To improve this trade-off, we propose two extensions to PTHSEL. First, we replace the flat cycle-for-cycle load cost model with a model based on a critical-path estimation. This extension increases p-thread efficiency in an energy-independent way. Second, we add a parameterized energy model to PTHSEL (forming PTHSEL+E) that allows it to actively select p-threads that reduce energy rather than (or in combination with) execution latency. Experiments show that PTHSEL+E manipulates pre-executionýs latency/energy more effectively. Latency targeted selection benefits from the improved load cost model: its performance improvements grow to an average of 16.4% while energy costs drop to 8.7%. ED targeted selection produces p-threads that improve performance by only 12.9%, but ED by 8.8%. Targeting p-thread selection for energy reduction, results in "energy-free" pre-execution, with average speedup of 5.4%, and a small decrease in total energy consumption (0.7%).</description><subject>Computer aided instruction</subject><subject>Computer architecture</subject><subject>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</subject><subject>Computing methodologies -- Modeling and simulation -- Model development and analysis -- Modeling methodologies</subject><subject>Delay</subject><subject>Energy consumption</subject><subject>General and reference -- Cross-computing tools and techniques -- Design</subject><subject>General and reference -- Cross-computing tools and techniques -- Performance</subject><subject>Hardware</subject><subject>Hardware -- Communication hardware, interfaces and storage</subject><subject>Hardware -- Hardware validation</subject><subject>Information science</subject><subject>Multithreading</subject><subject>Pareto optimization</subject><subject>Prefetching</subject><subject>Yarn</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>076952270X</isbn><isbn>9780769522708</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNqNkE1PwzAMhiM-JMbYjRuXXuCCMuKkSbbjNBWYNIlJG9JuUdo6UOjakXTA_j2tth-AL5b8Prash5BrYEMANn6YLaeTIWdMDrk-IT0utaQaxPqUXDKtxpJzzdZnpAdMCapGY31BBiF8sLZiCRx4j8ySCv3bnibOYdYU31hhCFHtooVHmvxitmuKuopslUdHcvJjPUYLunr3aPNoiWW3WFdX5NzZMuDg2Pvk9TFZTZ_p_OVpNp3MqRVaNTTmjses_WWUCeUAdZwxcNBO0aZKcumUyqSV4NJcCJUqlYIb5TIG7SyzVvTJ3eHu1tdfOwyN2RQhw7K0Fda7YATETEjFW_DmABaIaLa-2Fi_NxALkEq36e0htdnGpHX9GQww02k1nVbTaTW84-7_w5nUF-jEH5-Qcs0</recordid><startdate>20050101</startdate><enddate>20050101</enddate><creator>Petric, Vlad</creator><creator>Roth, Amir</creator><general>IEEE Computer Society</general><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050101</creationdate><title>Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection</title><author>Petric, Vlad ; Roth, Amir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a376t-42f2408978c36f1e74c01f12f2eab6525f66c5a51fbd336b66b1f8d5417fa0aa3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computer aided instruction</topic><topic>Computer architecture</topic><topic>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</topic><topic>Computing methodologies -- Modeling and simulation -- Model development and analysis -- Modeling methodologies</topic><topic>Delay</topic><topic>Energy consumption</topic><topic>General and reference -- Cross-computing tools and techniques -- Design</topic><topic>General and reference -- Cross-computing tools and techniques -- Performance</topic><topic>Hardware</topic><topic>Hardware -- Communication hardware, interfaces and storage</topic><topic>Hardware -- Hardware validation</topic><topic>Information science</topic><topic>Multithreading</topic><topic>Pareto optimization</topic><topic>Prefetching</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Petric, Vlad</creatorcontrib><creatorcontrib>Roth, Amir</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Petric, Vlad</au><au>Roth, Amir</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection</atitle><btitle>32nd International Symposium on Computer Architecture (ISCA'05)</btitle><stitle>ISCA</stitle><date>2005-01-01</date><risdate>2005</risdate><spage>322</spage><epage>333</epage><pages>322-333</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>076952270X</isbn><isbn>9780769522708</isbn><abstract>Pre-execution removes the microarchitectural latency of "problem" loads from a programýs critical path by redundantly executing copies of their computations in parallel with the main program. There have been several proposed pre-execution systems, a quantitative framework (PTHSEL) for analytical pre-execution thread (p-thread) selection, and even a research prototype. To date, however, the energy aspects of pre-execution have not been studied. Cycle-level performance and energy simulations on SPEC2000 integer benchmarks that suffer from L2 misses show that energy-blind pre-execution naturally has a linear latency/energy trade-off, improving performance by 13.8% while increasing energy consumption by 11.9%. To improve this trade-off, we propose two extensions to PTHSEL. First, we replace the flat cycle-for-cycle load cost model with a model based on a critical-path estimation. This extension increases p-thread efficiency in an energy-independent way. Second, we add a parameterized energy model to PTHSEL (forming PTHSEL+E) that allows it to actively select p-threads that reduce energy rather than (or in combination with) execution latency. Experiments show that PTHSEL+E manipulates pre-executionýs latency/energy more effectively. Latency targeted selection benefits from the improved load cost model: its performance improvements grow to an average of 16.4% while energy costs drop to 8.7%. ED targeted selection produces p-threads that improve performance by only 12.9%, but ED by 8.8%. Targeting p-thread selection for energy reduction, results in "energy-free" pre-execution, with average speedup of 5.4%, and a small decrease in total energy consumption (0.7%).</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.1109/ISCA.2005.27</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6897
ispartof 32nd International Symposium on Computer Architecture (ISCA'05), 2005, p.322-333
issn 1063-6897
2575-713X
language eng
recordid cdi_acm_books_10_1109_ISCA_2005_27
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computer aided instruction
Computer architecture
Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data
Computing methodologies -- Modeling and simulation -- Model development and analysis -- Modeling methodologies
Delay
Energy consumption
General and reference -- Cross-computing tools and techniques -- Design
General and reference -- Cross-computing tools and techniques -- Performance
Hardware
Hardware -- Communication hardware, interfaces and storage
Hardware -- Hardware validation
Information science
Multithreading
Pareto optimization
Prefetching
Yarn
title Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T07%3A47%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Energy-Effectiveness%20of%20Pre-Execution%20and%20Energy-Aware%20P-Thread%20Selection&rft.btitle=32nd%20International%20Symposium%20on%20Computer%20Architecture%20(ISCA'05)&rft.au=Petric,%20Vlad&rft.date=2005-01-01&rft.spage=322&rft.epage=333&rft.pages=322-333&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=076952270X&rft.isbn_list=9780769522708&rft_id=info:doi/10.1109/ISCA.2005.27&rft_dat=%3Cproquest_6IE%3E31403562%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=31403562&rft_id=info:pmid/&rft_ieee_id=1431567&rfr_iscdi=true