Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resourc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2009-09, Vol.20 (9), p.1309-1324
Hauptverfasser:	Lee, Jaejin, Jung, Changhee, Lim, Daeseob, Solihin, Yan
Format:	Artikel
Sprache:	eng
Schlagworte:	Chemical-mechanical polishing chip multiprocessors Communication system control Content management Hardness Hardware Helper thread Intelligent systems Multiprocessing systems Multiprocessor Pollution Pollution abatement Prefetching processing-in-memory system Processors Random access memory Simulation Surface-mount technology
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1324
container_issue	9
container_start_page	1309
container_title	IEEE transactions on parallel and distributed systems
container_volume	20
creator	Lee, Jaejin Jung, Changhee Lim, Daeseob Solihin, Yan
description	This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.
doi_str_mv	10.1109/TPDS.2008.224
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_35027356</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4641920</ieee_id><sourcerecordid>35027356</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-455c45704c662d1aa0b00fdf37e687f6471ae00c12629c61e79aebf7231e93803</originalsourceid><addsrcrecordid>eNp90TtPwzAUBeAIgUQpjEwsEQNMKdfveETlUVARlVpmy3VvaFDaBDsR6r_HUREDA5Mt-dPRvT5Jck5gRAjom8Xsbj6iAPmIUn6QDIgQeUZJzg7jHbjINCX6ODkJ4QOAcAF8kDzPPBbYunW5fU-_ynadTrBq0KeLtUe7CmlR-3Ra1wGrXTquu6bCVfrSVW3Z-NphCPF5vgstbsJpclTYKuDZzzlM3h7uF-NJNn19fBrfTjPHuGozLoTjQgF3UtIVsRaWAMWqYAplrgrJFbEI4AiVVDtJUGmLy0JRRlCzHNgwud7nxgk-Owyt2ZTBYVXZLdZdMLkSQCWhvbz6V7IIFRMywss_8KPu_DZuYXQfxHJCI8r2yPk6hPhtpvHlxvqdIWD6AkxfgOkLMLGA6C_2vkTEX8slJzpGfgNOXYAO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912033812</pqid></control><display><type>article</type><title>Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems</title><source>IEEE Electronic Library Online</source><creator>Lee, Jaejin ; Jung, Changhee ; Lim, Daeseob ; Solihin, Yan</creator><creatorcontrib>Lee, Jaejin ; Jung, Changhee ; Lim, Daeseob ; Solihin, Yan</creatorcontrib><description>This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2008.224</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Chemical-mechanical polishing ; chip multiprocessors ; Communication system control ; Content management ; Hardness ; Hardware ; Helper thread ; Intelligent systems ; Multiprocessing systems ; Multiprocessor ; Pollution ; Pollution abatement ; Prefetching ; processing-in-memory system ; Processors ; Random access memory ; Simulation ; Surface-mount technology</subject><ispartof>IEEE transactions on parallel and distributed systems, 2009-09, Vol.20 (9), p.1309-1324</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-455c45704c662d1aa0b00fdf37e687f6471ae00c12629c61e79aebf7231e93803</citedby><cites>FETCH-LOGICAL-c347t-455c45704c662d1aa0b00fdf37e687f6471ae00c12629c61e79aebf7231e93803</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4641920$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27926,27927,54760</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4641920$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lee, Jaejin</creatorcontrib><creatorcontrib>Jung, Changhee</creatorcontrib><creatorcontrib>Lim, Daeseob</creatorcontrib><creatorcontrib>Solihin, Yan</creatorcontrib><title>Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.</description><subject>Chemical-mechanical polishing</subject><subject>chip multiprocessors</subject><subject>Communication system control</subject><subject>Content management</subject><subject>Hardness</subject><subject>Hardware</subject><subject>Helper thread</subject><subject>Intelligent systems</subject><subject>Multiprocessing systems</subject><subject>Multiprocessor</subject><subject>Pollution</subject><subject>Pollution abatement</subject><subject>Prefetching</subject><subject>processing-in-memory system</subject><subject>Processors</subject><subject>Random access memory</subject><subject>Simulation</subject><subject>Surface-mount technology</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp90TtPwzAUBeAIgUQpjEwsEQNMKdfveETlUVARlVpmy3VvaFDaBDsR6r_HUREDA5Mt-dPRvT5Jck5gRAjom8Xsbj6iAPmIUn6QDIgQeUZJzg7jHbjINCX6ODkJ4QOAcAF8kDzPPBbYunW5fU-_ynadTrBq0KeLtUe7CmlR-3Ra1wGrXTquu6bCVfrSVW3Z-NphCPF5vgstbsJpclTYKuDZzzlM3h7uF-NJNn19fBrfTjPHuGozLoTjQgF3UtIVsRaWAMWqYAplrgrJFbEI4AiVVDtJUGmLy0JRRlCzHNgwud7nxgk-Owyt2ZTBYVXZLdZdMLkSQCWhvbz6V7IIFRMywss_8KPu_DZuYXQfxHJCI8r2yPk6hPhtpvHlxvqdIWD6AkxfgOkLMLGA6C_2vkTEX8slJzpGfgNOXYAO</recordid><startdate>20090901</startdate><enddate>20090901</enddate><creator>Lee, Jaejin</creator><creator>Jung, Changhee</creator><creator>Lim, Daeseob</creator><creator>Solihin, Yan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20090901</creationdate><title>Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems</title><author>Lee, Jaejin ; Jung, Changhee ; Lim, Daeseob ; Solihin, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-455c45704c662d1aa0b00fdf37e687f6471ae00c12629c61e79aebf7231e93803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Chemical-mechanical polishing</topic><topic>chip multiprocessors</topic><topic>Communication system control</topic><topic>Content management</topic><topic>Hardness</topic><topic>Hardware</topic><topic>Helper thread</topic><topic>Intelligent systems</topic><topic>Multiprocessing systems</topic><topic>Multiprocessor</topic><topic>Pollution</topic><topic>Pollution abatement</topic><topic>Prefetching</topic><topic>processing-in-memory system</topic><topic>Processors</topic><topic>Random access memory</topic><topic>Simulation</topic><topic>Surface-mount technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jaejin</creatorcontrib><creatorcontrib>Jung, Changhee</creatorcontrib><creatorcontrib>Lim, Daeseob</creatorcontrib><creatorcontrib>Solihin, Yan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jaejin</au><au>Jung, Changhee</au><au>Lim, Daeseob</au><au>Solihin, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2009-09-01</date><risdate>2009</risdate><volume>20</volume><issue>9</issue><spage>1309</spage><epage>1324</epage><pages>1309-1324</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2008.224</doi><tpages>16</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2009-09, Vol.20 (9), p.1309-1324
issn	1045-9219 1558-2183
language	eng
recordid	cdi_proquest_miscellaneous_35027356
source	IEEE Electronic Library Online
subjects	Chemical-mechanical polishing chip multiprocessors Communication system control Content management Hardness Hardware Helper thread Intelligent systems Multiprocessing systems Multiprocessor Pollution Pollution abatement Prefetching processing-in-memory system Processors Random access memory Simulation Surface-mount technology
title	Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T22%3A43%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prefetching%20with%20Helper%20Threads%20for%20Loosely%20Coupled%20Multiprocessor%20Systems&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Lee,%20Jaejin&rft.date=2009-09-01&rft.volume=20&rft.issue=9&rft.spage=1309&rft.epage=1324&rft.pages=1309-1324&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2008.224&rft_dat=%3Cproquest_RIE%3E35027356%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912033812&rft_id=info:pmid/&rft_ieee_id=4641920&rfr_iscdi=true