The impact of incorrectly speculated memory operations in a multithreaded architecture

The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2005-03, Vol.16 (3), p.271-285
Hauptverfasser: Sendag, R., Ying Chen, Lilja, D.J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 285
container_issue 3
container_start_page 271
container_title IEEE transactions on parallel and distributed systems
container_volume 16
creator Sendag, R.
Ying Chen
Lilja, D.J.
description The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.
doi_str_mv 10.1109/TPDS.2005.36
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2005_36</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1388216</ieee_id><sourcerecordid>2581296051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</originalsourceid><addsrcrecordid>eNpd0LtLxEAQBvBFFDwfnZ1NsLAy574fpZxPOFDwtF3WzYTLkdzG3U1x_70JJwhWM8VvPoYPoQuC54Rgc7t6u3-fU4zFnMkDNCNC6JISzQ7HHXNRGkrMMTpJaYMx4QLzGfpcraFout75XIS6aLY-xAg-t7si9eCH1mWoig66EHdF6CG63IRtGmHhim5oc5PXEVw1Ihf9usnj7RDhDB3Vrk1w_jtP0cfjw2rxXC5fn14Wd8vSM0FyqRxjTnDgRPO6_jJOUwWKCEmo0objWlPHtZdSVMwppbiqlJaV5ESCZtKwU3S9z-1j-B4gZds1yUPbui2EIVmqseZcyhFe_YObMMTt-Js1FDNilGEjutkjH0NKEWrbx6ZzcWcJtlPDdmrYTg1bNmVe7nkDAH-UaU2JZD_FS3Zt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>920319793</pqid></control><display><type>article</type><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><source>IEEE Electronic Library (IEL)</source><creator>Sendag, R. ; Ying Chen ; Lilja, D.J.</creator><creatorcontrib>Sendag, R. ; Ying Chen ; Lilja, D.J.</creatorcontrib><description>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2005.36</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Benchmark testing ; Communication networks ; Computer memory ; Delay ; Load ; mispredicted loads ; multithreaded architecture ; Pipelines ; Pollution ; Prefetching ; Registers ; Speculation ; Studies ; System performance ; Traffic control ; wrong execution ; wrong execution cache</subject><ispartof>IEEE transactions on parallel and distributed systems, 2005-03, Vol.16 (3), p.271-285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</citedby><cites>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1388216$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1388216$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sendag, R.</creatorcontrib><creatorcontrib>Ying Chen</creatorcontrib><creatorcontrib>Lilja, D.J.</creatorcontrib><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</description><subject>Benchmark testing</subject><subject>Communication networks</subject><subject>Computer memory</subject><subject>Delay</subject><subject>Load</subject><subject>mispredicted loads</subject><subject>multithreaded architecture</subject><subject>Pipelines</subject><subject>Pollution</subject><subject>Prefetching</subject><subject>Registers</subject><subject>Speculation</subject><subject>Studies</subject><subject>System performance</subject><subject>Traffic control</subject><subject>wrong execution</subject><subject>wrong execution cache</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpd0LtLxEAQBvBFFDwfnZ1NsLAy574fpZxPOFDwtF3WzYTLkdzG3U1x_70JJwhWM8VvPoYPoQuC54Rgc7t6u3-fU4zFnMkDNCNC6JISzQ7HHXNRGkrMMTpJaYMx4QLzGfpcraFout75XIS6aLY-xAg-t7si9eCH1mWoig66EHdF6CG63IRtGmHhim5oc5PXEVw1Ihf9usnj7RDhDB3Vrk1w_jtP0cfjw2rxXC5fn14Wd8vSM0FyqRxjTnDgRPO6_jJOUwWKCEmo0objWlPHtZdSVMwppbiqlJaV5ESCZtKwU3S9z-1j-B4gZds1yUPbui2EIVmqseZcyhFe_YObMMTt-Js1FDNilGEjutkjH0NKEWrbx6ZzcWcJtlPDdmrYTg1bNmVe7nkDAH-UaU2JZD_FS3Zt</recordid><startdate>200503</startdate><enddate>200503</enddate><creator>Sendag, R.</creator><creator>Ying Chen</creator><creator>Lilja, D.J.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200503</creationdate><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><author>Sendag, R. ; Ying Chen ; Lilja, D.J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Benchmark testing</topic><topic>Communication networks</topic><topic>Computer memory</topic><topic>Delay</topic><topic>Load</topic><topic>mispredicted loads</topic><topic>multithreaded architecture</topic><topic>Pipelines</topic><topic>Pollution</topic><topic>Prefetching</topic><topic>Registers</topic><topic>Speculation</topic><topic>Studies</topic><topic>System performance</topic><topic>Traffic control</topic><topic>wrong execution</topic><topic>wrong execution cache</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sendag, R.</creatorcontrib><creatorcontrib>Ying Chen</creatorcontrib><creatorcontrib>Lilja, D.J.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sendag, R.</au><au>Ying Chen</au><au>Lilja, D.J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The impact of incorrectly speculated memory operations in a multithreaded architecture</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2005-03</date><risdate>2005</risdate><volume>16</volume><issue>3</issue><spage>271</spage><epage>285</epage><pages>271-285</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2005.36</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2005-03, Vol.16 (3), p.271-285
issn 1045-9219
1558-2183
language eng
recordid cdi_crossref_primary_10_1109_TPDS_2005_36
source IEEE Electronic Library (IEL)
subjects Benchmark testing
Communication networks
Computer memory
Delay
Load
mispredicted loads
multithreaded architecture
Pipelines
Pollution
Prefetching
Registers
Speculation
Studies
System performance
Traffic control
wrong execution
wrong execution cache
title The impact of incorrectly speculated memory operations in a multithreaded architecture
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T14%3A20%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20impact%20of%20incorrectly%20speculated%20memory%20operations%20in%20a%20multithreaded%20architecture&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Sendag,%20R.&rft.date=2005-03&rft.volume=16&rft.issue=3&rft.spage=271&rft.epage=285&rft.pages=271-285&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2005.36&rft_dat=%3Cproquest_RIE%3E2581296051%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=920319793&rft_id=info:pmid/&rft_ieee_id=1388216&rfr_iscdi=true