The impact of incorrectly speculated memory operations in a multithreaded architecture
The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load i...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2005-03, Vol.16 (3), p.271-285 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 285 |
---|---|
container_issue | 3 |
container_start_page | 271 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 16 |
creator | Sendag, R. Ying Chen Lilja, D.J. |
description | The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses. |
doi_str_mv | 10.1109/TPDS.2005.36 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2005_36</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1388216</ieee_id><sourcerecordid>2581296051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</originalsourceid><addsrcrecordid>eNpd0LtLxEAQBvBFFDwfnZ1NsLAy574fpZxPOFDwtF3WzYTLkdzG3U1x_70JJwhWM8VvPoYPoQuC54Rgc7t6u3-fU4zFnMkDNCNC6JISzQ7HHXNRGkrMMTpJaYMx4QLzGfpcraFout75XIS6aLY-xAg-t7si9eCH1mWoig66EHdF6CG63IRtGmHhim5oc5PXEVw1Ihf9usnj7RDhDB3Vrk1w_jtP0cfjw2rxXC5fn14Wd8vSM0FyqRxjTnDgRPO6_jJOUwWKCEmo0objWlPHtZdSVMwppbiqlJaV5ESCZtKwU3S9z-1j-B4gZds1yUPbui2EIVmqseZcyhFe_YObMMTt-Js1FDNilGEjutkjH0NKEWrbx6ZzcWcJtlPDdmrYTg1bNmVe7nkDAH-UaU2JZD_FS3Zt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>920319793</pqid></control><display><type>article</type><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><source>IEEE Electronic Library (IEL)</source><creator>Sendag, R. ; Ying Chen ; Lilja, D.J.</creator><creatorcontrib>Sendag, R. ; Ying Chen ; Lilja, D.J.</creatorcontrib><description>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2005.36</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Benchmark testing ; Communication networks ; Computer memory ; Delay ; Load ; mispredicted loads ; multithreaded architecture ; Pipelines ; Pollution ; Prefetching ; Registers ; Speculation ; Studies ; System performance ; Traffic control ; wrong execution ; wrong execution cache</subject><ispartof>IEEE transactions on parallel and distributed systems, 2005-03, Vol.16 (3), p.271-285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</citedby><cites>FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1388216$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1388216$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sendag, R.</creatorcontrib><creatorcontrib>Ying Chen</creatorcontrib><creatorcontrib>Lilja, D.J.</creatorcontrib><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</description><subject>Benchmark testing</subject><subject>Communication networks</subject><subject>Computer memory</subject><subject>Delay</subject><subject>Load</subject><subject>mispredicted loads</subject><subject>multithreaded architecture</subject><subject>Pipelines</subject><subject>Pollution</subject><subject>Prefetching</subject><subject>Registers</subject><subject>Speculation</subject><subject>Studies</subject><subject>System performance</subject><subject>Traffic control</subject><subject>wrong execution</subject><subject>wrong execution cache</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpd0LtLxEAQBvBFFDwfnZ1NsLAy574fpZxPOFDwtF3WzYTLkdzG3U1x_70JJwhWM8VvPoYPoQuC54Rgc7t6u3-fU4zFnMkDNCNC6JISzQ7HHXNRGkrMMTpJaYMx4QLzGfpcraFout75XIS6aLY-xAg-t7si9eCH1mWoig66EHdF6CG63IRtGmHhim5oc5PXEVw1Ihf9usnj7RDhDB3Vrk1w_jtP0cfjw2rxXC5fn14Wd8vSM0FyqRxjTnDgRPO6_jJOUwWKCEmo0objWlPHtZdSVMwppbiqlJaV5ESCZtKwU3S9z-1j-B4gZds1yUPbui2EIVmqseZcyhFe_YObMMTt-Js1FDNilGEjutkjH0NKEWrbx6ZzcWcJtlPDdmrYTg1bNmVe7nkDAH-UaU2JZD_FS3Zt</recordid><startdate>200503</startdate><enddate>200503</enddate><creator>Sendag, R.</creator><creator>Ying Chen</creator><creator>Lilja, D.J.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200503</creationdate><title>The impact of incorrectly speculated memory operations in a multithreaded architecture</title><author>Sendag, R. ; Ying Chen ; Lilja, D.J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-7a33a54e4184ffb9a827e71561278940f82a48c665d3a77747d786d6416e83693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Benchmark testing</topic><topic>Communication networks</topic><topic>Computer memory</topic><topic>Delay</topic><topic>Load</topic><topic>mispredicted loads</topic><topic>multithreaded architecture</topic><topic>Pipelines</topic><topic>Pollution</topic><topic>Prefetching</topic><topic>Registers</topic><topic>Speculation</topic><topic>Studies</topic><topic>System performance</topic><topic>Traffic control</topic><topic>wrong execution</topic><topic>wrong execution cache</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sendag, R.</creatorcontrib><creatorcontrib>Ying Chen</creatorcontrib><creatorcontrib>Lilja, D.J.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sendag, R.</au><au>Ying Chen</au><au>Lilja, D.J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The impact of incorrectly speculated memory operations in a multithreaded architecture</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2005-03</date><risdate>2005</risdate><volume>16</volume><issue>3</issue><spage>271</spage><epage>285</epage><pages>271-285</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2005.36</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2005-03, Vol.16 (3), p.271-285 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TPDS_2005_36 |
source | IEEE Electronic Library (IEL) |
subjects | Benchmark testing Communication networks Computer memory Delay Load mispredicted loads multithreaded architecture Pipelines Pollution Prefetching Registers Speculation Studies System performance Traffic control wrong execution wrong execution cache |
title | The impact of incorrectly speculated memory operations in a multithreaded architecture |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T14%3A20%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20impact%20of%20incorrectly%20speculated%20memory%20operations%20in%20a%20multithreaded%20architecture&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Sendag,%20R.&rft.date=2005-03&rft.volume=16&rft.issue=3&rft.spage=271&rft.epage=285&rft.pages=271-285&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2005.36&rft_dat=%3Cproquest_RIE%3E2581296051%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=920319793&rft_id=info:pmid/&rft_ieee_id=1388216&rfr_iscdi=true |