Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive appr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Chi-Keung Luk
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acceleration Application software Computer architecture Delay Hardware Multithreading Prefetching Surface-mount technology Vehicles Yarn
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	51
container_issue
container_start_page	40
container_title
container_volume
creator	Chi-Keung Luk
description	Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution-essentially a combined act of speculative address generation and prefetching to accelerate the main thread. In this paper we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need of shortening programs for pre-execution, and no need of special hardware to copy register values upon thread spawns). Consequently, only minimal extensions to SMT machines are required to support our technique. Despite its simplicity, our technique offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.
doi_str_mv	10.1109/ISCA.2001.937430
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_937430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>937430</ieee_id><sourcerecordid>937430</sourcerecordid><originalsourceid>FETCH-LOGICAL-i104t-f4c7579705ff5318f2db39a68d1f6febaced7b78e6928b4ce52b9752cd609ff73</originalsourceid><addsrcrecordid>eNotkDtPwzAYRS0eEm1hR0z-Ayl-xHY8VhWUSpUYKBJb5TifW6MkrmxHkH9PqzLde4dzhovQIyVzSol-Xn8sF3NGCJ1rrkpOrtCECSUKRfnXNZoSJbWgVDJ1gyaUSF7ISqs7NE3p-wRpLeQEddvQQjTZ93vcQRfiiFuTobcjzocYhv0Bp-Dyj4lQ2NDnGNoWGnw8TfgFO2Qfeux7nHw3tNn0EIaEz9WfcDDN2XuMwUJKIaZ7dOtMm-DhP2fo8_Vlu3wrNu-r9XKxKTwlZS5caZVQWhHhnOC0cqypuTayaqiTDmpjoVG1qkBqVtWlBcFqrQSzjSTaOcVn6Oni9QCwO0bfmTjuLi_xP6wfXlc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Chi-Keung Luk</creator><creatorcontrib>Chi-Keung Luk</creatorcontrib><description>Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution-essentially a combined act of speculative address generation and prefetching to accelerate the main thread. In this paper we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need of shortening programs for pre-execution, and no need of special hardware to copy register values upon thread spawns). Consequently, only minimal extensions to SMT machines are required to support our technique. Despite its simplicity, our technique offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>EISSN: 2575-713X</identifier><identifier>DOI: 10.1109/ISCA.2001.937430</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acceleration ; Application software ; Computer architecture ; Delay ; Hardware ; Multithreading ; Prefetching ; Surface-mount technology ; Vehicles ; Yarn</subject><ispartof>Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.40-51</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/937430$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/937430$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chi-Keung Luk</creatorcontrib><title>Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors</title><title>Proceedings 28th Annual International Symposium on Computer Architecture</title><addtitle>ISCA</addtitle><description>Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution-essentially a combined act of speculative address generation and prefetching to accelerate the main thread. In this paper we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need of shortening programs for pre-execution, and no need of special hardware to copy register values upon thread spawns). Consequently, only minimal extensions to SMT machines are required to support our technique. Despite its simplicity, our technique offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.</description><subject>Acceleration</subject><subject>Application software</subject><subject>Computer architecture</subject><subject>Delay</subject><subject>Hardware</subject><subject>Multithreading</subject><subject>Prefetching</subject><subject>Surface-mount technology</subject><subject>Vehicles</subject><subject>Yarn</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkDtPwzAYRS0eEm1hR0z-Ayl-xHY8VhWUSpUYKBJb5TifW6MkrmxHkH9PqzLde4dzhovQIyVzSol-Xn8sF3NGCJ1rrkpOrtCECSUKRfnXNZoSJbWgVDJ1gyaUSF7ISqs7NE3p-wRpLeQEddvQQjTZ93vcQRfiiFuTobcjzocYhv0Bp-Dyj4lQ2NDnGNoWGnw8TfgFO2Qfeux7nHw3tNn0EIaEz9WfcDDN2XuMwUJKIaZ7dOtMm-DhP2fo8_Vlu3wrNu-r9XKxKTwlZS5caZVQWhHhnOC0cqypuTayaqiTDmpjoVG1qkBqVtWlBcFqrQSzjSTaOcVn6Oni9QCwO0bfmTjuLi_xP6wfXlc</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Chi-Keung Luk</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2001</creationdate><title>Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors</title><author>Chi-Keung Luk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i104t-f4c7579705ff5318f2db39a68d1f6febaced7b78e6928b4ce52b9752cd609ff73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Acceleration</topic><topic>Application software</topic><topic>Computer architecture</topic><topic>Delay</topic><topic>Hardware</topic><topic>Multithreading</topic><topic>Prefetching</topic><topic>Surface-mount technology</topic><topic>Vehicles</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Chi-Keung Luk</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chi-Keung Luk</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors</atitle><btitle>Proceedings 28th Annual International Symposium on Computer Architecture</btitle><stitle>ISCA</stitle><date>2001</date><risdate>2001</risdate><spage>40</spage><epage>51</epage><pages>40-51</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution-essentially a combined act of speculative address generation and prefetching to accelerate the main thread. In this paper we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need of shortening programs for pre-execution, and no need of special hardware to copy register values upon thread spawns). Consequently, only minimal extensions to SMT machines are required to support our technique. Despite its simplicity, our technique offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.</abstract><pub>IEEE</pub><doi>10.1109/ISCA.2001.937430</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6897
ispartof	Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.40-51
issn	1063-6897 2575-713X
language	eng
recordid	cdi_ieee_primary_937430
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Acceleration Application software Computer architecture Delay Hardware Multithreading Prefetching Surface-mount technology Vehicles Yarn
title	Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T21%3A28%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Tolerating%20memory%20latency%20through%20software-controlled%20pre-execution%20in%20simultaneous%20multithreading%20processors&rft.btitle=Proceedings%2028th%20Annual%20International%20Symposium%20on%20Computer%20Architecture&rft.au=Chi-Keung%20Luk&rft.date=2001&rft.spage=40&rft.epage=51&rft.pages=40-51&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1109/ISCA.2001.937430&rft_dat=%3Cieee_6IE%3E937430%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=937430&rfr_iscdi=true